Results 1 to 5 of 5

Thread: How to read the text of a pdf file?

  1. #1
    Join Date
    Jun 2011
    Location
    Porto Alegre, Brazil
    Posts
    482
    Thanks
    165
    Thanked 2 Times in 2 Posts
    Qt products
    Qt5
    Platforms
    Unix/X11 Windows

    Default How to read the text of a pdf file?

    Hello!

    I already know how to read a text from a .txt file using QFile, QTextStream and so forth, but I don't know how to open a .pdf file and read its content. I tried to do it in the same way recently, and what I got was:

    Qt Code:
    1. "%PDF-1.5
    2. %µµµµ
    3. 1 0 obj
    4. <</Type/Catalog/Pages 2 0 R/Lang(pt-BR) /StructTreeRoot 8 0 R/MarkInfo<</Marked true>>>>
    5. endobj
    6. 2 0 obj
    7. <</Type/Pages/Count 1/Kids[ 3 0 R] >>
    8. endobj
    9. 3 0 obj
    10. <</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.2 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
    11. endobj
    12. 4 0 obj
    13. <</Filter/FlateDecode/Length 147>>
    14. stream
    15. xœM±
    16. Â0„÷@ÞáÆ?C“üiÓ8”M«(²AÁÇ7YTnø8îƒÙ£ëÌ·#lßc#†$…Y3˜µmÎR0lN&¶Õu@^7é&…Å¥ÔFŠ…’ªxE'U3½Ÿªj讜#djé˜ÛÑ£€×Uq *H;)¦,+¯·ÚûÅ’/~[LsÄì_#ó
    17. endstream
    18. endobj
    19. 5 0 obj
    20. <</Type/Font/Subtype/TrueType/Name/F1/BaseFont/Times#20New#20Roman/Encoding/WinAnsiEncoding/FontDescriptor 6 0 R/FirstChar 32/LastChar 120/Widths 14 0 R>>
    21. endobj
    22. 6 0 obj
    23. <</Type/FontDescriptor/FontName/Times#20New#20Roman/Flags 32/ItalicAngle 0/Ascent 891/Descent -216/CapHeight 693/AvgWidth 401/MaxWidth 2568/FontWeight 400/XHeight 250/Leading 42/StemV 40/FontBBox[ -568 -216 2000 693] >>
    24. endobj
    25. 7 0 obj
    26. <</Author(Martin)/Creator(þÿ
    To copy to clipboard, switch view to plain text mode 

    while the text inside was "texto aqui".

    So how do I open a .pdf file and read its content inside a Qt software?

    Thanks!

    Momergil

  2. #2
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: How to read the text of a pdf file?

    Like you do when you are not using Qt... you use a third party library or utility, e.g. Poppler or PoDoFo, or you write your own based on the public PDF reference material. Qt does not contain any ability to interpret PDF files.

  3. #3
    Join Date
    Jun 2011
    Location
    Porto Alegre, Brazil
    Posts
    482
    Thanks
    165
    Thanked 2 Times in 2 Posts
    Qt products
    Qt5
    Platforms
    Unix/X11 Windows

    Default Re: How to read the text of a pdf file?

    Quote Originally Posted by ChrisW67 View Post
    Like you do when you are not using Qt... you use a third party library or utility, e.g. Poppler or PoDoFo, or you write your own based on the public PDF reference material. Qt does not contain any ability to interpret PDF files.
    Hello Chris,

    thanks very much.


    God bless,

    Momergil

  4. #4

    Default Re: How to read the text of a pdf file?

    Did you mean to extract text from pdf files? I wonder whether there are any differences between pdf extraction and pdf to text conversion process? Whose way of processing is much simpler and faster? Any suggestion will be appreciated. Thanks in advance.



    Best regards,
    Lee

  5. #5
    Join Date
    Jan 2006
    Location
    Graz, Austria
    Posts
    8,416
    Thanks
    37
    Thanked 1,544 Times in 1,494 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: How to read the text of a pdf file?

    Running a converter tool is likely easier, as this just means running a child process and gathering its output or reading its result file.

    Using a PDF library is more code but also potentially gives you information on paging, formatting, etc.

    Cheers,
    _

Similar Threads

  1. How to read text file including all empty spaces..
    By umulingu in forum Qt Programming
    Replies: 1
    Last Post: 29th September 2009, 07:33
  2. Read Text file using structure..
    By umulingu in forum Qt Programming
    Replies: 7
    Last Post: 14th September 2009, 11:22
  3. How to read line number in a text file
    By grsandeep85 in forum Qt Programming
    Replies: 7
    Last Post: 31st July 2009, 09:09
  4. Continuous read of text from an input file
    By ttvo in forum Qt Programming
    Replies: 1
    Last Post: 2nd June 2009, 00:09
  5. How to read text only as it is from file
    By thomasjoy in forum Qt Programming
    Replies: 3
    Last Post: 9th August 2007, 08:47

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.