Hello,
I have a folder with thousands of documents : Word-documents (doc, docx and rtf) and PDF-documents.
I need to do a search in the content of all these documents to see if certain words can be found.
This results into a list of documents, all of them containing the word I have been searching for.
This proces needs to be done within my application.
Any suggestions?
Thank you very much in advance.
Searching in the content of documents
Searching in the content of documents
Regards,
Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.09 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.09 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
- Otto
- Posts: 6414
- Joined: Fri Oct 07, 2005 7:07 pm
- Has thanked: 30 times
- Been thanked: 2 times
- Contact:
Re: Searching in the content of documents
Michel,
maybe you can use findstr?
memowrit bat-file and winexec() and memoread the result.
findstr /P "xbrowse" C:\FWH\samples\*.* >test.log
Best regards,
Otto
https://stackoverflow.com/questions/884 ... str-comman
maybe you can use findstr?
memowrit bat-file and winexec() and memoread the result.
findstr /P "xbrowse" C:\FWH\samples\*.* >test.log
Best regards,
Otto
https://stackoverflow.com/questions/884 ... str-comman
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
- Otto
- Posts: 6414
- Joined: Fri Oct 07, 2005 7:07 pm
- Has thanked: 30 times
- Been thanked: 2 times
- Contact:
Re: Searching in the content of documents
Carlos,
I remember that I did tests with fileseek. But you need the paid version to get a CSV export of the results.
Best regards,
Otto
viewtopic.php?f=3&t=33244&p=196025&hilit=fileseek&sid=b0f3b637d2d0ef8daf74d1ff56516df8#p196025
I remember that I did tests with fileseek. But you need the paid version to get a CSV export of the results.
Best regards,
Otto
viewtopic.php?f=3&t=33244&p=196025&hilit=fileseek&sid=b0f3b637d2d0ef8daf74d1ff56516df8#p196025
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
- Otto
- Posts: 6414
- Joined: Fri Oct 07, 2005 7:07 pm
- Has thanked: 30 times
- Been thanked: 2 times
- Contact:
Re: Searching in the content of documents
Hello Michel,
findstr() does not search DOCX.
For DOCX I use UNZIP and then search in the XML files.
I have a test here with UNZIP the DOCX files and search then in the XML file.
116 DOCX files are searched. Only one contains the search term.
Best regards,
Otto
data:image/s3,"s3://crabby-images/5bc87/5bc8773829ae366f25fd1668fc6f37bf7960ed22" alt="Image"
findstr() does not search DOCX.
For DOCX I use UNZIP and then search in the XML files.
I have a test here with UNZIP the DOCX files and search then in the XML file.
116 DOCX files are searched. Only one contains the search term.
Best regards,
Otto
data:image/s3,"s3://crabby-images/5bc87/5bc8773829ae366f25fd1668fc6f37bf7960ed22" alt="Image"
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
mod harbour - Vamos a la conquista de la Web
modharbour.org
https://www.facebook.com/groups/modharbour.club
********************************************************************
Re: Searching in the content of documents
Hello Otto,
Thank you very much for your efforts trying to help me.
How about your suggestion when one need to search in a few hundred thousands of documents?
Is the system still doing its job?
I'll have to test it but I will only be able to test in the second half of next week since I'm going on holiday for one week.
But I'll start my test asap.
Thanks once again.
Thank you very much for your efforts trying to help me.
How about your suggestion when one need to search in a few hundred thousands of documents?
Is the system still doing its job?
I'll have to test it but I will only be able to test in the second half of next week since I'm going on holiday for one week.
But I'll start my test asap.
Thanks once again.
Regards,
Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.09 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
Michel D.
Genk (Belgium)
_____________________________________________________________________________________________
I use : FiveWin for (x)Harbour v. 24.09 - Harbour 3.2.0 (February 2024) - xHarbour Builder (January 2020) - Bcc773
Re: Searching in the content of documents
hi,
have not test it yet but there "seems" to be a "simple" Way using ADO
look at Github for "Windows-classic-samples-main.zip" (have no Link yet)
---
have not test it yet but there "seems" to be a "simple" Way using ADO
look at Github for "Windows-classic-samples-main.zip" (have no Link yet)
Windows-classic-samples-main.zip\Windows-classic-samples-main\Samples\Win7Samples\winui\WindowsSearch\WSFromScript\QueryEverything.vbs
---
page_type: sample
languages:
- vbscript
products:
- windows-api-win32
name: WSFromScript sample
urlFragment: wsfromscript-sample
description: Demonstrates to query Windows Search from a Microsoft Visual Basic script using Microsoft ActiveX Data Objects (ADO).
extendedZipContent:
- path: LICENSE
target: LICENSE
---
# WSFromScript sample
The WSFromScript code sample demonstrates how to query Windows Search from a Microsoft Visual Basic script using Microsoft ActiveX Data Objects (ADO).
greeting,
Jimmy
Jimmy