I am trying to take some html formatted text ( web page scrape ) and convert it to plain text and remove all the spaces .. .. I have removed all the tags ( for the most part ) with this code but can not remove the spaces and get the readable text ..
Any advice would be appreciated.
Thanks
Rick Lipkin
Code: Select all | Expand
cDESC := upper(IE:document:documentElement:outerHTML)// clean up the textnLOOP := 0DO WHILE .T. IF AT('<TD CLASS='+'"'+'STD'+'">' , cDESC) > 0 cDESC := STRTRAN( cDESC, '<TD CLASS='+'"'+'STD'+'">', space(0) ) ENDIF nLOOP++ IF nLOOP > 10 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<TD>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<TD>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</TD>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</TD>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</TR>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</TR>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<TR>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<TR>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<BR>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<BR>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<LI>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<LI>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</LI>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</LI>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<B>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<B>", space(0) ) ENDIF nLOOP++ IF nLOOP > 40 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</B>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</B>", space(0) ) ENDIF nLOOP++ IF nLOOP > 40 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</UL>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</UL>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</TBODY>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</TBODY>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<TBODY>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<TBODY>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("</TABLE>", cDESC) > 0 cDESC := STRTRAN( cDESC, "</TABLE>", space(0) ) ENDIF nLOOP++ IF nLOOP > 30 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT("<TABLE border=0 cellSpacing=0 cellPadding=1>", cDESC) > 0 cDESC := STRTRAN( cDESC, "<TABLE border=0 cellSpacing=0 cellPadding=1>", space(0) ) ENDIF nLOOP++ IF nLOOP > 2 EXIT ENDIFENDDOMSGINFO( CDESC )IF AT( (chr(13)+chr(10)), cDESC) > 0 MSGINFO( "FOUND CR LINE FEED")ELSE MSGINFO( "NOT FOUND CR LINE FEED") // can not be foundENDIFIF AT( (chr(13)), cDESC) > 0 MSGINFO( "FOUND CR FEED")ELSE MSGINFO( "NOT FOUND CR FEED") // cannot be foundENDIFIF AT( " ", cDESC) > 0 MSGINFO( "FOUND SPACE(2)")ELSE MSGINFO( "NOT FOUND SPACE(2)") // can not be foundENDIFnLOOP := 0DO WHILE .T. IF AT( (chr(13)+chr(10)), cDESC) > 0 cDESC := STRTRAN( cDESC, (chr(13)+chr(10)), space(0) ) ENDIF nLOOP++ IF nLOOP > 500 EXIT ENDIFENDDOnLOOP := 0DO WHILE .T. IF AT( " ", cDESC) > 0 cDESC := STRTRAN( cDESC, " ", space(0) ) ENDIF nLOOP++ IF nLOOP > 5000 EXIT ENDIFENDDO
Code: Select all | Expand
<TD CLASS="STD">DESCRIPTION:</TD> <TD CLASS="STD">SPINDLE ASSEMBLY - HEAVY DUTY<BR>AYP 130794</TD> </TR> <TR> <TD CLASS="STD">PACK SIZE:</TD> <TD CLASS="STD">1</TD> </TR> <TR> <TD CLASS="STD">LIST PRICE:</TD> <TD CLASS="STD">$48.96</TD> </TR> <TR> <TD CLASS="STD">REPLACES (OEM):</TD> <TD CLASS="STD"> AYP 130794<BR> HUSQVARNA 532 13 07-94<BR> </TD> </TR> <TR> <TD CLASS="STD">FITS MODELS:</TD> <TD CLASS="STD"><B>AYP</B> 36", 38" AND 42" CUT VENTILATED DECKS USING STAR SHAPED CENTER HOLE BLADES<BR><B>HUSQVARNA</B> 36", 38" AND 42" CUT VENTILATED DECKS USING STAR SHAPED CENTER HOLE BLADES</TD> </TR> <TR> <TD CLASS="STD"> SPECS: </TD> <TD> <TABLE BORDER="0" CELLSPACING="0" CELLPADDING="1"> <TBODY><TR> <TD> <UL STYLE="MARGIN-LEFT: 20PX;"> <LI>HEIGHT:7" </LI> <LI>HEAVY DUTY VERSION OF OUR 285-456</LI> <LI>INCLUDES PULLEY NUT, BLADE BOLT, WASHER AND SPACER</LI> <LI>NO THREADS, SELF TAPPING</LI> <LI>USES 275-280 SPINDLE PULLEY FOR 38" CUT DECKS</LI> <LI>USES 275-284 SPINDLE PULLEY FOR 42" CUT DECKS</LI> </UL> </TD> </TR> </TBODY></TABLE> </TD> </TR> <TR> <TD CLASS="STD">