Jeff,
Here is a function I wrote a long time ago to do exactly this. It returns what it finds in a array of strings, not a single string. I suppose this can be optimized somewhat, but other than pre-allocating the array to avoid a bunch of AAdd() calls, I've simply never had the need to improve on it. If I were going to optimize anything the first would be to keep track of the offset into the string, instead of trimming the front off the input strings after each match.
As is, I often return an array of strings of 15,000 to 20,000 at a time, parsing various logs and xml files. Some of the logs are quite large, 50+ MB. Large logs means large memory allocation. Still, I typically just read the entire file into cInputString and process it all in one pass. I do have a version that finds the first instance of matching tags and returns that single instance in a string, but I hardly every use that version.
As written it creates a local upper case copy of the input string and the tags and does an upper case match, but it returns what it finds in the original case.
- Code: Select all Expand view
#if ! defined( DEFAULT_MAX_RECORDS )
#define DEFAULT_MAX_RECORDS 20000
#endif
FUNCTION BETWEENTAGSARRAY( cStartTag, cEndTag, cInputString, lIncludeTags )
LOCAL nStartPoint, nEndPoint
LOCAL nRecords := 00, nFetchLength := 00, aFoundText := Array( DEFAULT_MAX_RECORDS )
LOCAL cMDML
LOCAL cInputStringUpper := Upper( cInputString )
LOCAL cStartTagUpper := Upper( cStartTag )
LOCAL cEndTagUpper := Upper( cEndTag )
hb_Default( @lIncludeTags, .F. )
DO WHILE .T.
// Find the starting point of the starting tag.
nStartPoint := At( cStartTagUpper, SubStr( cInputStringUpper, 01 ) )
IF nStartPoint > 00
// Adjust starting point to end of starting tag
nStartPoint += Len( cStartTagUpper )
// If the first tag is found strip off string up to and including the starting tag itself
cInputStringUpper := SubStr( cInputStringUpper, nStartPoint )
cInputString := SubStr( cInputString, nStartPoint )
// Find the starting point of the second tag, beginning from end of first tag.
nEndPoint := At( cEndTagUpper, cInputStringUpper )
IF nEndPoint > 00
// If the second tag is found calculate its position from start of string.
nFetchLength := nEndPoint - 1
IF lIncludeTags
cMDML := cStartTag + LTrim( SubStr( cInputString, 01, nFetchLength ) ) + cEndTag
ELSE
cMDML := LTrim( SubStr( cInputString, 01, nFetchLength ) )
ENDIF
IF ++nRecords <= DEFAULT_MAX_RECORDS
aFoundText[ nRecords ] := cMDML
ELSE
// IF we get here it is gonna be oh so slow.
AAdd( aFoundText, cMDML )
ENDIF
// clip off the front of the string then loop to find the next
cInputStringUpper := SubStr( cInputStringUpper, nFetchLength + 01 )
cInputString := SubStr( cInputString, nFetchLength + 01 )
ELSE
EXIT
ENDIF
ELSE
EXIT
ENDIF
ENDDO
IF nRecords < DEFAULT_MAX_RECORDS
aFoundText := ASize( aFoundText, nRecords )
ENDIF
RETURN ( aFoundText )
Robb