Results 1 to 3 of 3

Thread: HTML parsing

  1. #1
    Join Date
    Aug 2009
    Posts
    24
    Thanks
    3
    Qt products
    Qt4
    Platforms
    Unix/X11

    Question HTML parsing

    Can anyone recommend a class to do HTML parsing.
    One of the key differences with XML appears to be HTMLs sloppy endtags. I have tried to subclass the QXmlDefaultHandler but it dies on missing endtags. Even when I continue after intercepting fatal errors normal event reporting is discontinued.

    I have thought about an 'insert' function (to insert endtags on the fly elided by HTML) in a subclass of the XmlSimpleReader but that also appears a major job.

    Any suggestions how to get a proper DOM document from a HTML source?
    Enno

  2. #2
    Join Date
    Apr 2006
    Location
    Denmark / Norway
    Posts
    67
    Thanks
    3
    Thanked 12 Times in 8 Posts
    Qt products
    Qt4
    Platforms
    MacOS X Windows

    Default Re: HTML parsing

    Suggestings for reading:
    http://www.qtcentre.org/forum/f-qt-p...html-4698.html

    Tidy can be found here:
    http://tidy.sourceforge.net/
    a c++ wrapper here:
    http://users.rcn.com/creitzel/tidy.html#cplusplus

    By using tidy you should be able to get the data in a way so that you can use it with QDomDocument.

  3. The following user says thank you to luf for this useful post:

    enno (1st October 2009)

  4. #3
    Join Date
    May 2009
    Posts
    133
    Thanks
    10
    Thanked 4 Times in 3 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: HTML parsing

    Yes, parsing real world (broken) HTML is not an easy task. It's true you could try using HTML Tidy but if you're already using Qt I would advise not to do so and to use something already available in Qt. Use QtWebKit and QWebElement which is new in Qt 4.6 and you have your DOM ready in 15 minutes.

  5. The following user says thank you to piotr.dobrogost for this useful post:

    enno (1st October 2009)

Similar Threads

  1. html parsing class problem
    By yagabey in forum Qt Programming
    Replies: 4
    Last Post: 22nd December 2008, 18:52
  2. QTextEdit html parsing trouble
    By DpoHro in forum Qt Programming
    Replies: 1
    Last Post: 20th January 2008, 11:40
  3. Loading images in html in a QTextBrowser
    By BasicPoke in forum Newbie
    Replies: 1
    Last Post: 6th June 2007, 21:51
  4. Parsing HTML
    By stevey in forum Qt Programming
    Replies: 2
    Last Post: 1st December 2006, 20:01
  5. HTML Parsing
    By awalesminfo in forum Qt Programming
    Replies: 3
    Last Post: 19th March 2006, 11:31

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.