Searching in the content of documents

Post Reply
User avatar
driessen
Posts: 1422
Joined: Mon Oct 10, 2005 11:26 am
Location: Genk, Belgium

Searching in the content of documents

Post by driessen »

Hello,

I have a folder with thousands of documents : Word-documents (doc, docx and rtf) and PDF-documents.
I need to do a search in the content of all these documents to see if certain words can be found.
This results into a list of documents, all of them containing the word I have been searching for.

This proces needs to be done within my application.

Any suggestions?

Thank you very much in advance.
Regards,

Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.09 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
User avatar
Otto
Posts: 6414
Joined: Fri Oct 07, 2005 7:07 pm
Has thanked: 30 times
Been thanked: 2 times
Contact:

Re: Searching in the content of documents

Post by Otto »

Michel,
maybe you can use findstr?
memowrit bat-file and winexec() and memoread the result.
findstr /P "xbrowse" C:\FWH\samples\*.* >test.log

Best regards,
Otto

https://stackoverflow.com/questions/884 ... str-comman
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
csincuir
Posts: 419
Joined: Sat Feb 03, 2007 6:36 am
Location: Guatemala
Has thanked: 3 times
Been thanked: 4 times
Contact:

Re: Searching in the content of documents

Post by csincuir »

Or you can also use FileSeek:
https://www.fileseek.ca/
It's fast and easy to use

Best regards

Carlos
User avatar
Otto
Posts: 6414
Joined: Fri Oct 07, 2005 7:07 pm
Has thanked: 30 times
Been thanked: 2 times
Contact:

Re: Searching in the content of documents

Post by Otto »

Carlos,
I remember that I did tests with fileseek. But you need the paid version to get a CSV export of the results.
Best regards,
Otto

viewtopic.php?f=3&t=33244&p=196025&hilit=fileseek&sid=b0f3b637d2d0ef8daf74d1ff56516df8#p196025
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
Otto
Posts: 6414
Joined: Fri Oct 07, 2005 7:07 pm
Has thanked: 30 times
Been thanked: 2 times
Contact:

Re: Searching in the content of documents

Post by Otto »

Hello Michel,
findstr() does not search DOCX.
For DOCX I use UNZIP and then search in the XML files.

I have a test here with UNZIP the DOCX files and search then in the XML file.
116 DOCX files are searched. Only one contains the search term.

Best regards,
Otto
Image
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
User avatar
driessen
Posts: 1422
Joined: Mon Oct 10, 2005 11:26 am
Location: Genk, Belgium

Re: Searching in the content of documents

Post by driessen »

Hello Otto,

Thank you very much for your efforts trying to help me.
How about your suggestion when one need to search in a few hundred thousands of documents?
Is the system still doing its job?

I'll have to test it but I will only be able to test in the second half of next week since I'm going on holiday for one week.
But I'll start my test asap.

Thanks once again.
Regards,

Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.09 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
User avatar
Jimmy
Posts: 1740
Joined: Thu Sep 05, 2019 5:32 am
Location: Hamburg, Germany
Has thanked: 2 times

Re: Searching in the content of documents

Post by Jimmy »

hi,

have not test it yet but there "seems" to be a "simple" Way using ADO

look at Github for "Windows-classic-samples-main.zip" (have no Link yet)
Windows-classic-samples-main.zip\Windows-classic-samples-main\Samples\Win7Samples\winui\WindowsSearch\WSFromScript\QueryEverything.vbs

---
page_type: sample
languages:
- vbscript
products:
- windows-api-win32
name: WSFromScript sample
urlFragment: wsfromscript-sample
description: Demonstrates to query Windows Search from a Microsoft Visual Basic script using Microsoft ActiveX Data Objects (ADO).
extendedZipContent:
- path: LICENSE
target: LICENSE
---

# WSFromScript sample
The WSFromScript code sample demonstrates how to query Windows Search from a Microsoft Visual Basic script using Microsoft ActiveX Data Objects (ADO).
greeting,
Jimmy
Post Reply