hi at all,
from a my program I call pdftk to fill text fields or set checkboxes of an existing PDF file,
but some utf8 characters, pdftk does not handle them (Romanian language etc.)
I wanted to do this directly from the program. is it possible with the tPdf class of fivewin, harbour or other?
in tPdf I don't think there are methods that can be useful for my case. in the example files I didn't find anything.
thanks for any help
Marzio
fill text fields and checkbox of existing PDF files
- Antonio Linares
- Site Admin
- Posts: 42720
- Joined: Thu Oct 06, 2005 5:47 pm
- Location: Spain
- Has thanked: 93 times
- Been thanked: 106 times
- Contact:
Re: fill text fields and checkbox of existing PDF files
Dear Marzio,
You're correct that pdftk has limitations when it comes to handling UTF-8 characters, especially for languages like Romanian that use diacritics (e.g., ă, î, ș, ț). This is a known issue with pdftk, as it doesn't fully support UTF-8 encoding for form filling, often resulting in garbled or missing characters. Since you're working with FiveWin and Harbour, and considering the tPdf class, let’s explore your options for filling PDF form fields with UTF-8 support directly from your program.
The tPdf Class in FiveWin/Harbour
The tPdf class in FiveWin (a library for Harbour and Clipper) is primarily designed for generating PDF files from scratch, not for manipulating existing PDFs or filling form fields in pre-existing documents. It’s a lightweight tool based on basic PDF generation principles, similar to early versions of libraries like FPDF. From the available documentation and examples (typically found in the FiveWin distribution or community forums), tPdf lacks built-in methods for:
Loading an existing PDF.
Interacting with form fields (e.g., filling text fields or setting checkboxes).
Handling AcroForms (the PDF form structure).
The examples provided with tPdf (e.g., creating text, lines, or simple graphics) focus on generating new PDFs rather than editing existing ones. As a result, tPdf is not suitable for your use case of filling form fields in an existing PDF with UTF-8 support.
Alternative Approaches
Since tPdf doesn’t meet your needs, here are some viable alternatives to achieve your goal directly from your Harbour/FiveWin program:
1. Use a Different PDF Library with Form-Filling Capabilities
You can integrate a more capable PDF library into your Harbour/FiveWin application. Here are two options that support form filling and UTF-8:
TCPDF (via PHP Integration)
TCPDF is a PHP library that supports UTF-8 and can fill PDF forms. While Harbour doesn’t natively interface with PHP, you could:
Write a small PHP script using TCPDF to handle the PDF manipulation.
Call this script from Harbour using EXECUTE or RUN to pass data (e.g., via a temporary file or command-line arguments).
TCPDF’s setFormField() method allows you to fill text fields and checkboxes in an existing PDF, and it supports UTF-8 with appropriate font embedding (e.g., DejaVu Sans).
Challenges: Requires setting up a PHP environment and handling inter-process communication, which adds complexity.
iText (via Java or C# Integration)
iText is a powerful PDF library (available in Java and C#) that supports form filling with full UTF-8 support. You could:
Create a small Java or C# utility to process the PDF.
Call it from Harbour using a system command or a DLL interface (if you compile it to a native library).
iText’s AcroFields class can fill text fields and checkboxes while preserving UTF-8 characters.
Challenges: Requires Java or .NET setup and bridging Harbour to another language.
2. Use a Command-Line Tool with Better UTF-8 Support
Instead of pdftk, you could switch to a tool that handles UTF-8 better and call it from your Harbour program:
QPDF
QPDF is a command-line tool that can manipulate PDFs, including filling forms via its form-filling capabilities (though it’s less straightforward than pdftk). It has better UTF-8 support when paired with a proper FDF (Form Data Format) file. You’d need to:
Generate an FDF file with UTF-8 encoded data from Harbour.
Use QPDF to merge the FDF into the PDF.
Example Workflow:
harbour
cFDF := 'FDF content with UTF-8 data'
hb_MemoWrit("data.fdf", cFDF)
RUN 'qpdf --replace-input input.pdf --fill-form data.fdf output.pdf'
pdftk-java
This is a maintained fork of pdftk with improved UTF-8 support. It might handle Romanian characters better than the original pdftk. You can test it by calling it from Harbour:
harbour
RUN 'pdftk-java input.pdf fill_form data.fdf output output.pdf'
3. Extend tPdf or Write Custom Code
If you’re comfortable with Harbour and PDF internals, you could extend tPdf or write a custom solution:
Parse the existing PDF manually (PDF is a text-based format at its core).
Locate the AcroForm fields (text fields and checkboxes are defined in the PDF structure).
Modify their values with UTF-8 encoded strings.
Rebuild the PDF stream.
This approach requires deep knowledge of the PDF specification (ISO 32000) and is time-intensive. Libraries like tPdf don’t provide this level of manipulation out of the box, so you’d essentially be building a mini PDF editor.
4. Use a Harbour-Compatible Library
There’s no widely known Harbour-native library for PDF form filling with UTF-8 support, but you could explore:
Haru (libHaru): A C library for PDF generation with some editing capabilities. It supports UTF-8 but doesn’t natively handle form filling. You’d need to wrap it in Harbour via a C interface.
Custom FFI: Use Harbour’s foreign function interface (FFI) to call a C/C++ library like Poppler or MuPDF, which can manipulate PDFs and handle UTF-8.
Recommendation
Given your constraints and the need for simplicity:
Short-Term Solution: Switch to pdftk-java or QPDF. Test if they handle Romanian UTF-8 characters correctly when called from Harbour. This keeps your workflow similar to what you’re already doing.
Long-Term Solution: Integrate TCPDF via a PHP script. It’s well-documented, supports UTF-8 fully (with font embedding), and can fill forms in existing PDFs. You’d call it from Harbour like this:
harbour
cCmd := 'php fill_pdf.php input.pdf "field1=ăîșț" "checkbox1=on" output.pdf'
RUN cCmd
Conclusion
The tPdf class in FiveWin/Harbour isn’t designed for your use case. For direct form filling with UTF-8 support, you’ll need to either leverage an external tool (like pdftk-java or QPDF) or integrate a more capable library (like TCPDF or iText). The easiest path is to test pdftk-java first, as it’s a drop-in replacement for pdftk with potential UTF-8 improvements. If that fails, TCPDF via PHP is a robust, Harbour-compatible option. Let me know if you’d like help crafting a specific implementation!
Re: fill text fields and checkbox of existing PDF files
dear Antonio,
thank you for your kindness and for your exhaustive treatment.
thanks also for your willingness to help me, I would like to try some proposed software solutions,
but my skills are not sufficient to create a program in Python, C#, C++ or other.
I would like to try "Custom FFI" but first I will study your proposals well but I will certainly focus on the 2 external programs proposed.
if they work better than PDFtk it will already be a good result.
as I had already done with PDFtk some alphabetical characters that were completely distorted, (with a function) I replaced them with a readable character. eg. with our t I replaced the Romanian t with the tilde below, always better than seeing a completely different character.
I hope these work better. then I will update you on the results.
many thanks again.
Marzio
thank you for your kindness and for your exhaustive treatment.
thanks also for your willingness to help me, I would like to try some proposed software solutions,
but my skills are not sufficient to create a program in Python, C#, C++ or other.
I would like to try "Custom FFI" but first I will study your proposals well but I will certainly focus on the 2 external programs proposed.
if they work better than PDFtk it will already be a good result.
as I had already done with PDFtk some alphabetical characters that were completely distorted, (with a function) I replaced them with a readable character. eg. with our t I replaced the Romanian t with the tilde below, always better than seeing a completely different character.
I hope these work better. then I will update you on the results.
many thanks again.
Marzio
Re: fill text fields and checkbox of existing PDF files
some update:
I tried QPDF but it doesn't seem to have the functions to fill text fields, in this link there are the info:
https://qpdf.readthedocs.io/en/stable/c ... on-options
PDFTK-Java I was unable to find the version for Windows. The Linux version works only if Java is installed on your computer.
It should have DLL library for use without installing Java.
I tried QPDF but it doesn't seem to have the functions to fill text fields, in this link there are the info:
https://qpdf.readthedocs.io/en/stable/c ... on-options
PDFTK-Java I was unable to find the version for Windows. The Linux version works only if Java is installed on your computer.
It should have DLL library for use without installing Java.