FiveTech Software tech support forums

by **Jeff Barnes** » Tue Mar 15, 2016 7:45 pm

I am looking for a way to extract text from a text file and was wondering if something already exists to do what I need.

I was looking for something like:

cText := TextExtract( cItem1, cItem2)

cItem1 and cItem2 would be the text that is around the text I am after.

Text Example from the file:
<system_name>MYPC101</system_name>

So I would like to do something like this:
cText := TextExtract( "<system_name>", "</system_name>" )

And it would return cText as "MYPC101"

Any ideas?

by **cnavarro** » Tue Mar 15, 2016 8:04 pm

See if it's something

viewtopic.php?f=6&t=31216&p=180781&hilit=leer+html#p180780

by **rhlawek** » Tue Mar 15, 2016 10:49 pm

Jeff,

Here is a function I wrote a long time ago to do exactly this. It returns what it finds in a array of strings, not a single string. I suppose this can be optimized somewhat, but other than pre-allocating the array to avoid a bunch of AAdd() calls, I've simply never had the need to improve on it. If I were going to optimize anything the first would be to keep track of the offset into the string, instead of trimming the front off the input strings after each match.

As is, I often return an array of strings of 15,000 to 20,000 at a time, parsing various logs and xml files. Some of the logs are quite large, 50+ MB. Large logs means large memory allocation. Still, I typically just read the entire file into cInputString and process it all in one pass. I do have a version that finds the first instance of matching tags and returns that single instance in a string, but I hardly every use that version.

As written it creates a local upper case copy of the input string and the tags and does an upper case match, but it returns what it finds in the original case.

Code: Select all Expand view: #if ! defined( DEFAULT_MAX_RECORDS ) #define DEFAULT_MAX_RECORDS 20000 #endif FUNCTION BETWEENTAGSARRAY( cStartTag, cEndTag, cInputString, lIncludeTags ) LOCAL nStartPoint, nEndPoint LOCAL nRecords := 00, nFetchLength := 00, aFoundText := Array( DEFAULT_MAX_RECORDS ) LOCAL cMDML LOCAL cInputStringUpper := Upper( cInputString ) LOCAL cStartTagUpper := Upper( cStartTag ) LOCAL cEndTagUpper := Upper( cEndTag ) hb_Default( @lIncludeTags, .F. ) DO WHILE .T. // Find the starting point of the starting tag. nStartPoint := At( cStartTagUpper, SubStr( cInputStringUpper, 01 ) ) IF nStartPoint > 00 // Adjust starting point to end of starting tag nStartPoint += Len( cStartTagUpper ) // If the first tag is found strip off string up to and including the starting tag itself cInputStringUpper := SubStr( cInputStringUpper, nStartPoint ) cInputString := SubStr( cInputString, nStartPoint ) // Find the starting point of the second tag, beginning from end of first tag. nEndPoint := At( cEndTagUpper, cInputStringUpper ) IF nEndPoint > 00 // If the second tag is found calculate its position from start of string. nFetchLength := nEndPoint - 1 IF lIncludeTags cMDML := cStartTag + LTrim( SubStr( cInputString, 01, nFetchLength ) ) + cEndTag ELSE cMDML := LTrim( SubStr( cInputString, 01, nFetchLength ) ) ENDIF IF ++nRecords <= DEFAULT_MAX_RECORDS aFoundText[ nRecords ] := cMDML ELSE // IF we get here it is gonna be oh so slow. AAdd( aFoundText, cMDML ) ENDIF // clip off the front of the string then loop to find the next cInputStringUpper := SubStr( cInputStringUpper, nFetchLength + 01 ) cInputString := SubStr( cInputString, nFetchLength + 01 ) ELSE EXIT ENDIF ELSE EXIT ENDIF ENDDO IF nRecords < DEFAULT_MAX_RECORDS aFoundText := ASize( aFoundText, nRecords ) ENDIF RETURN ( aFoundText )

Robb

by **James Bott** » Tue Mar 15, 2016 11:43 pm

Jeff,

See: FWH\samples\xmlreader.prg

This is a sample XML document reader.

James

by **rhlawek** » Wed Mar 16, 2016 2:36 am

A lot of what I pull out of logs is xml, but it typically gets written as a line in the log, not clean XML. That is actually why I wrote this function, and also why it has a switch to leave the tags in place as part of the returned strings or not. With XML I typically want the tags, but with other raw logs I do not. I do use the TXMLDocument class, which is used in samples\xmlreader.prg, to parse the XML after it is extracted.

by **Jeff Barnes** » Wed Mar 16, 2016 5:02 pm

Thanks Robb. With some slight fine tuning (less than 5 minutes) it does exactly what i need

James, I couldn't look at the sample xmleader.prg as I don't seem to have that in my samples folder.
Maybe my FWH version didn't have that.

by **Antonio Linares** » Wed Mar 16, 2016 9:19 pm

Jeff,

FWH\samples\xmlreader.prg

Code: Select all Expand view: // Simple example for a generic XML reader #include "FiveWin.ch" function Main() local hFile := FOpen( "test.xml" ) Local oXmlDoc := TXmlDocument():New( hFile ) Local oXmlIter := TXmlIterator():New( oXmlDoc:oRoot ), oTagActual while .T. oTagActual = oXmlIter:Next() If oTagActual != nil MsgInfo( oTagActual:cName, oTagActual:cData ) HEval( oTagActual:aAttributes, { | cKey, cValue | MsgInfo( cKey, cValue ) } ) Else Exit Endif End FClose( hFile ) return nil

FiveTech Software tech support forums

Extract Text

Extract Text

Re: Extract Text

Re: Extract Text

Re: Extract Text

Re: Extract Text

Re: Extract Text

Re: Extract Text

Who is online