Fulltextsearch (to Marco Boschi)

Re: Fulltextsearch (to Marco Boschi)

Postby Maurizio » Tue Nov 09, 2010 3:41 pm

Hello
with xHarbour you have to use HSX.LIB

Maurizio
www.nipeservice.com
User avatar
Maurizio
 
Posts: 826
Joined: Mon Oct 10, 2005 1:29 pm

Re: Fulltextsearch (to Marco Boschi)

Postby James Bott » Tue Nov 09, 2010 4:14 pm

Stefan,

In your example what is the var HS, an object, an order number?

Can these indexes be part of a single index file and be kept up automatically just like any other CDX index.

Regards,
James
User avatar
James Bott
 
Posts: 4840
Joined: Fri Nov 18, 2005 4:52 pm
Location: San Diego, California, USA

Re: Fulltextsearch (to Marco Boschi)

Postby frose » Tue Nov 09, 2010 7:00 pm

James and all other,

here ist the syntax of:
HS_Index( <cFileName>   , ;
          <cExpression> , ;
         [<nKeySize>]   , ;
         [<nOpenMode>]  , ;
         [<nBufferSize>], ;
         [<lCaseInsens>], ;
         [<nFilterSet>]   ) --> nErrorCode
so it's an error number.

I've made some tests with HiPer-SEEK and WildMatch some time ago and I've decided to use WildMatch.

There is no need to maintain an extra index file and it's very fast if you have defined the indexes correspondingly.

See also: viewtopic.php?f=3&t=1546
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: Fulltextsearch (to Marco Boschi)

Postby Otto » Tue Nov 09, 2010 7:11 pm

Hello James,
>Does this include memo fields?
Did you find out if memo fields are included?
Best regards,
Otto

Thanks to all for the help!
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
 
Posts: 6364
Joined: Fri Oct 07, 2005 7:07 pm

Re: Fulltextsearch (to Marco Boschi)

Postby frose » Tue Nov 09, 2010 7:28 pm

Otto,

take a look at this HiPer-SEEK example:
Code: Select all  Expand view  RUN
// The example creates a new, populated HiPER-SEEK index for
// a customer database.

   PROCEDURE Main
      LOCAL cIndex, nHandle

      CLS
      USE Customer ALIAS Cust

      cIndex := 'Trim(Cust->LastName) +'
      cIndex +=  '" "'
      cIndex += '+ Trim(Cust->FirstName)'

      nHandle := HS_Index( "Customer.hsx", cIndex, 3, 0, 16 )

      IF nHandle >= 0
         ? "HiPer-SEEK index successfully created with"
         ?? HS_KeyCount( nHandle), "index entries"
         HS_Close( nHandle )
      ELSE
         ? "HiPer-SEEK index creation failed with error:", nHandle
      ENDIF

      USE
   RETURN

So , if the content of the memo field is part of <cIndex>, then you can search memo fields

I think for WildMatch it's analog, see example:
Code: Select all  Expand view  RUN
// The example demonstrates the pattern matching algorithm employed
// by WildMatch() and how the function can be used as filter condition
// for a database

   PROCEDURE Main
      LOCAL cStr := "The xHarbour compiler"

      ? WildMatch( "bo?"  , cStr, .F. )  // result: .F.
      ? WildMatch( "bo?"  , cStr, .T. )  // result: .F.

      ? WildMatch( "*bo"  , cStr, .F. )  // result: .T.
      ? WildMatch( "*bo"  , cStr, .T. )  // result: .F.

      ? WildMatch( "The"  , cStr, .F. )  // result: .T.
      ? WildMatch( "The"  , cStr, .T. )  // result: .F.

      ? WildMatch( "The*r", cStr, .F. )  // result: .T.
      ? WildMatch( "The*r", cStr, .T. )  // result: .T.

      ? WildMatch( "The?x", cStr, .F. )  // result: .T.
      ? WildMatch( "The?x", cStr, .T. )  // result: .F.

      USE Customer
      SET FILTER TO WildMatch( "W*s", FIELD->LastName )

      GO TOP
      DbEval( {|| QOut( FIELD->LastName ) } )
      // Output: Names starting with "W" and ending with "s"
      // Walters
      // Waters

      CLOSE Customer
   RETURN
 

and the description:
WildMatch() is a pattern matching function that searches a string for a search pattern. If the search pattern is found, the function returns .T. (true).
WildMatch() operates similarly to OrdWildSeek() but can be used as part of a SET FILTER condition on an unindexed database.

But honestly spoken, I have no experience with memo fields. Store such informations in extra tables, avoiding the disadvantages of FPT :wink:
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: Fulltextsearch (to Marco Boschi)

Postby Gale FORd » Tue Nov 09, 2010 7:51 pm

But when you need to search memo fields or all of the fields then it is nice to use Hyper-Seek indexing.

Otto,
To use Hyper-Seek you can either pass the string you want to add to the index using hs_add(), or you can index the whole file using 1 or more field names using hs_index().

This is from the help file.
The HiPer-SEEK functions can be used to index and search any character based information the application can access.

Look at these examples.
Code: Select all  Expand view  RUN

* Example 1 - Indexes the whole dbf using hs_index()

    LOCAL cExpr := "test->FIRST + test->LAST + test->STREET + test->CITY"
    LOCAL nCount := 0, lStrict := .F.
    LOCAL cSearch := "Steve John Robert"
    LOCAL GetList := {}

    CLS
    USE test EXCL

    IF !file("TEST.HSX")
      @ 0,0 "Building HiPer-SEEK Index..."
      hs_Index( "TEST.HSX", cExpr, 2 )
      ?? "Done!"
      Inkey(1)
      CLS
    ENDIF

    WHILE .T.
      cSearch := PadR( cSearch, 59 )
      @ 0,0 SAY "Search Values.....:" GET cSearch
      @ 1,0 SAY "Strict Match (Y/N):" GET lStrict PICTURE "Y"
      READ
      IF LastKey() == K_ESC
        CLS
        EXIT
      ENDIF
      cSearch := AllTrim( cSearch )

      @ 3,0 SAY "Setting HiPer-SEEK Filter ..."

      nCount := hs_Filter( "TEST.HSX", cSearch, iif( lStrict, cExpr, "" ))

      ?? "Done!"
      @ 4,0 SAY LTrim( Str( nCount )) + " records meeting filter condition."

      @ 6,0 SAY "Press any key to browse the matching records..."
      Inkey(0)

      GO TOP
      Browse()
      CLS
    ENDDO

* Example 2 - Adds strings manually to build the index using hs_add().

   local nRec, h, bExpr
    use test exclusive
    h := hs_Open( "TEST.HSX", 10, 1 )
    bExpr := { || test->mymemo }
    DO WHILE !eof()
      nRec := hs_Add( h, bExpr )
      IF nRec < 1 .OR. nRec != RecNo()
        ? "Error adding record " + LTrim( Str( RecNo() )) + "!"
      ENDIF
      SKIP
    ENDDO

 
Gale FORd
 
Posts: 663
Joined: Mon Dec 05, 2005 11:22 pm
Location: Houston

Re: Fulltextsearch (to Marco Boschi)

Postby frose » Tue Nov 09, 2010 8:42 pm

Gale,

specially have the following scenario in mind:

address table: Company, First, Last, Street, Zip, City, Telefon
Index 1, ..., n: Company, First + Last, Last, Street, Zip, City, Telefon

You need more than one HiPer-SEEK index, if you want to search separately in different fields, perhaps:
HS-Index 1, 2, 3: All fields, Company + First + Last, Street

With WildMatch you are flexible:

- ask user for search pattern, e. g. cPattern := "*robert?bosch*"
- ask the user for the filter condition, perhaps a check box with all fields, e. g.:
    cFilter := "Company + First + Last + Street + Zip + City + Telefon" // searching in all fields, finding Robert Bosch in name and street OR
    cFilter := "Company + First + Last" // searching only in name fields, finding Robert Bosch in name OR
    cFilter := "Street" //fields, finding Robert Bosch only in streets OR
    etc.
- built the filter: bFilter := &( "{ || WildMatch( " + ValToPrg( cPattern ) + ", " + cFilter + ", .F. ) }" )
- set the filter: MsgRun( "Filter setting is in progress, please wait!", "Setting filter", { || ( cAlias )->( DbSetFilter( bFilter, cFilter ) ) } )

In the following example the result is only the record 'Robert Bosch GmbH':
Image
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: Fulltextsearch (to Marco Boschi)

Postby reinaldocrespo » Tue Nov 09, 2010 8:53 pm

Have you looked at FTS indexes in ADS? The free local ADS version works perfectly with dbf/cdx file pairs. These indexes can be used on AOF's() and are very very fast.


Reinaldo.
User avatar
reinaldocrespo
 
Posts: 979
Joined: Thu Nov 17, 2005 5:49 pm
Location: Fort Lauderdale, FL

Re: Fulltextsearch (to Marco Boschi)

Postby James Bott » Tue Nov 09, 2010 9:28 pm

Everyone,

While all these samples are interesting and very useful, I am not sure we are addressing Otto's problem.

Otto, we still need a description of what kind of search you need to do. Do you need to search for a single word in all memo fields, or maybe more than one word (AND, OR). Is there only one memo field?

I assume that if an index is needed that it needs to be maintained all the time rather than created when used?

Regards,
James
User avatar
James Bott
 
Posts: 4840
Joined: Fri Nov 18, 2005 4:52 pm
Location: San Diego, California, USA

Re: Fulltextsearch (to Marco Boschi)

Postby Otto » Tue Nov 09, 2010 9:55 pm

Dear James,
the WildMatch Frank descripts would match my necessity if I can include memofields.
I will do some tests with the code Frank posted.
Best regards,
Otto
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
 
Posts: 6364
Joined: Fri Oct 07, 2005 7:07 pm

Re: Fulltextsearch (to Marco Boschi)

Postby Otto » Tue Nov 09, 2010 10:31 pm

Dear James,
this is my code for full text search. Speed is very good.
At the moment I only search the dbf-file.
Now I thought if I could search fpt-file the same way too and could find out the record number the memoblock belongs would be great.
But it seems not possible. Maybe I can store a unique id inside the memoblock which I suppress when I show the memoblock data.
What do you think?
Best regards,
Otto
Code: Select all  Expand view  RUN


static aKunden

function SearchFile( suchbeg )
   local nLocation, cData
   local nOffset := 0
   local cDBF    := ( "kunden.dbf" )
   local nPos    := 0

   suchbeg := ALLTRIM(Upper(suchbeg))
   cData   := Upper(MemoRead( cDBF ))

   if Len(cData ) < 1
      MsgInfo("Not Data to Search","File Error")
      Return Nil
   endif

   nOffset := 0

   do while .t.
      nPos := INT( AT( suchbeg, cData, nOffset ))

      nLocation := INT( ( nPos - Header() ) / RecSize() ) + 1
      nOffset   := Header() + nLocation * RecSize() //+ RecSize()

      if nPos > 0 .and. nPos <  Header()
      else
         if nLocation < 1
            Exit
         else
            select kunden
            goto nLocation
            if DELETED() = .F.
               aAdd( aKunden, getrec() )
            endif
         endif

      endif

   enddo

Return Nil
//----------------------------------------------------------------------------//





 
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
 
Posts: 6364
Joined: Fri Oct 07, 2005 7:07 pm

Re: Fulltextsearch (to Marco Boschi)

Postby James Bott » Tue Nov 09, 2010 11:42 pm

But it seems not possible.


Of course it is possible, it is just a matter of level of effort. You just have to determine if it is worth the effort.

There are several ways it can be done.

1) Just scan the dbf by skipping records and checking each one. This would be very slow.

2) Reading the entire memo file into a memory var and then scanning only the active blocks for records that are not deleted. This would be faster.

3) Build an index on every word in the memo field for each record. This would require some fancy coding in a database class to make and keep up this index. Here the index would not be a normal RDD index but rather a separate DBF containing all the words and the records that they occur in. Then this DBF would be indexed. The DBF would be updated whenever a record was saved. This method would be very fast.

4) If HiPer-Six supports indexing of memo fields, then this would be an option also.

Maybe Gale, Frank, or Stefan can show us an example of how to do this using the HiPer-Six RDD (assuming it can be done).

Option 2 hinges on being able to find out (or figure out) how to skip unused blocks in the file.

Option 3 would require the most work since a fair amount of coding would be required.

So Otto, there you have four possible solutions.

Regards,
James
User avatar
James Bott
 
Posts: 4840
Joined: Fri Nov 18, 2005 4:52 pm
Location: San Diego, California, USA

Re: Fulltextsearch (to Marco Boschi)

Postby Gale FORd » Wed Nov 10, 2010 1:40 am

You can index multiple fields including memo fields into 1 Hyper-SEEK index.
Look at my sample 1. It indexes 4 fields into 1 index. Any one of the fields could have been a memo field.

Check out the Parts inventory example in the Overview at the end of this post.
This quote tells a lot about how you can search/filter.
"HiPer-SEEK lets you specify almost any combination of words, partial words, numeric characters and phrases. Word order can be considered or not, as can case sensitivity"

This is the overview for Hyper-SEEK in the Six3 manual
HiPer-SEEK Overview:

HiPer-SEEK is a relatively simple, yet powerful, system which uses Index
Applications' Fast Text Search technology. Married to the Mach SIx query
optimizer, it provides unmatched power for high speed searches and
filters.

The HiPer-SEEK system is a set of functions which create and maintain a
proprietary index file(s) enabling the rapid search of textual data from
within Clipper compiled applications. The HiPer-SEEK functions can be
used to index and search any character based information the application
can access.

Data for a HiPer-SEEK index usually consist of selected fields of .DBF
records. For example, you might want to be able to find people in a
customer database. HiPer-SEEK could build an index based on the contents
of the first name, last name, address, city, state and zipcode fields.
You would then be able to find individual records by specifying character
strings which occur anywhere in any of these fields.

Given "john," HiPer-SEEK would identify records for John Smith, Elton
John, 345 John White Avenue, and Johnson City. Additionally, the
application might limit matches to those containing "john" in a
particular field or in a particular position within a field. Searches can
be very general or very specific. The speed of HiPer-SEEK makes all sorts
of these additional operations possible.

Another example is a parts inventory/order system. Part numbers are found
by searching on a description field. Part No. WS-740283-B, described in
an 80 character field as" Windshield bracket, right side for model year
1983" is found by searching for "right windshield 83" or "brack
windshield." HiPer-SEEK lets you specify almost any combination of words,
partial words, numeric characters and phrases. Word order can be
considered or not, as can case sensitivity.

The default index extension for HiPer-SEEK indexes is .HSX, although, as
with any other index file types, you can make it what you want when you
create it.

One of the new functions, hs_Filter(), is especially unique. It combines
the CFTS technology with our Mach SIx query optimizer to allow you to
create extremely high speed filters on substrings within fields, text
within memos
, and other typically non-indexable (or non-SEEKable) values.
For example, in a standard index built on LAST+FIRST, you can't find John
Smith by doing a SEEK on "John", since it's not at the fron of the index.
Thus, Mach SIx would have previously been unable to optimize a filter
looking for "John" by using that index. With the marriage of CFTS and
Mach SIx this, and many other possibilities, are now possible.
Gale FORd
 
Posts: 663
Joined: Mon Dec 05, 2005 11:22 pm
Location: Houston

Re: Fulltextsearch (to Marco Boschi)

Postby James Bott » Wed Nov 10, 2010 2:47 am

Gale,

Thanks for all the info. This looks very good.

I do wonder about what is required to switch from CDX indexes to HiperSix indexes. Is this going to require a lot of code changes due to differences in syntax?

Regards,
James
User avatar
James Bott
 
Posts: 4840
Joined: Fri Nov 18, 2005 4:52 pm
Location: San Diego, California, USA

Re: Fulltextsearch (to Marco Boschi)

Postby frose » Wed Nov 10, 2010 8:00 am

Reinaldo,

yes, we are using ADS and FTS and I agree with you, it's fast, memo fields are indexable, ...
But the free local ADS version is (very) slow for larger tables, specially regarding FTS :(

So for LAN-users with a running ADS - not ALS -, FTS with AOF is the best.
For local users, it's better to buy an ADS licence - running a local ADS - or use 'DBFCDX' with WildMatch.
Using WildMatch with ADS is not recommandable, see viewtopic.php?f=3&t=19255.


Otto,

yes, WildMatch and memo fields is working, just tested it:

SET FILTER TO WildMatch( "*will*", RSTRSTT->Mymemo )
Image
But it's not possible to index memo fields, so it might be slow(er) with large tables?
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

PreviousNext

Return to FiveWin for Harbour/xHarbour

Who is online

Users browsing this forum: No registered users and 36 guests