Searching in the content of documents

Searching in the content of documents

Postby driessen » Wed Mar 16, 2022 10:03 am

Hello,

I have a folder with thousands of documents : Word-documents (doc, docx and rtf) and PDF-documents.
I need to do a search in the content of all these documents to see if certain words can be found.
This results into a list of documents, all of them containing the word I have been searching for.

This proces needs to be done within my application.

Any suggestions?

Thank you very much in advance.
Regards,

Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.07 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
User avatar
driessen
 
Posts: 1422
Joined: Mon Oct 10, 2005 11:26 am
Location: Genk, Belgium

Re: Searching in the content of documents

Postby Otto » Wed Mar 16, 2022 11:11 am

Michel,
maybe you can use findstr?
memowrit bat-file and winexec() and memoread the result.
findstr /P "xbrowse" C:\FWH\samples\*.* >test.log

Best regards,
Otto

https://stackoverflow.com/questions/884 ... str-comman
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
 
Posts: 6332
Joined: Fri Oct 07, 2005 7:07 pm

Re: Searching in the content of documents

Postby csincuir » Wed Mar 16, 2022 11:35 am

Or you can also use FileSeek:
https://www.fileseek.ca/
It's fast and easy to use

Best regards

Carlos
csincuir
 
Posts: 407
Joined: Sat Feb 03, 2007 6:36 am
Location: Guatemala

Re: Searching in the content of documents

Postby Otto » Wed Mar 16, 2022 12:34 pm

Carlos,
I remember that I did tests with fileseek. But you need the paid version to get a CSV export of the results.
Best regards,
Otto

viewtopic.php?f=3&t=33244&p=196025&hilit=fileseek&sid=b0f3b637d2d0ef8daf74d1ff56516df8#p196025
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
 
Posts: 6332
Joined: Fri Oct 07, 2005 7:07 pm

Re: Searching in the content of documents

Postby Otto » Wed Mar 16, 2022 9:54 pm

Hello Michel,
findstr() does not search DOCX.
For DOCX I use UNZIP and then search in the XML files.

I have a test here with UNZIP the DOCX files and search then in the XML file.
116 DOCX files are searched. Only one contains the search term.

Best regards,
Otto
Image
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
 
Posts: 6332
Joined: Fri Oct 07, 2005 7:07 pm

Re: Searching in the content of documents

Postby driessen » Wed Mar 16, 2022 10:11 pm

Hello Otto,

Thank you very much for your efforts trying to help me.
How about your suggestion when one need to search in a few hundred thousands of documents?
Is the system still doing its job?

I'll have to test it but I will only be able to test in the second half of next week since I'm going on holiday for one week.
But I'll start my test asap.

Thanks once again.
Regards,

Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.07 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
User avatar
driessen
 
Posts: 1422
Joined: Mon Oct 10, 2005 11:26 am
Location: Genk, Belgium

Re: Searching in the content of documents

Postby Jimmy » Thu Mar 17, 2022 8:05 pm

hi,

have not test it yet but there "seems" to be a "simple" Way using ADO

look at Github for "Windows-classic-samples-main.zip" (have no Link yet)
Windows-classic-samples-main.zip\Windows-classic-samples-main\Samples\Win7Samples\winui\WindowsSearch\WSFromScript\QueryEverything.vbs

---
page_type: sample
languages:
- vbscript
products:
- windows-api-win32
name: WSFromScript sample
urlFragment: wsfromscript-sample
description: Demonstrates to query Windows Search from a Microsoft Visual Basic script using Microsoft ActiveX Data Objects (ADO).
extendedZipContent:
- path: LICENSE
target: LICENSE
---

# WSFromScript sample
The WSFromScript code sample demonstrates how to query Windows Search from a Microsoft Visual Basic script using Microsoft ActiveX Data Objects (ADO).
greeting,
Jimmy
User avatar
Jimmy
 
Posts: 1732
Joined: Thu Sep 05, 2019 5:32 am
Location: Hamburg, Germany


Return to FiveWin for Harbour/xHarbour

Who is online

Users browsing this forum: Google [Bot] and 83 guests