enno
28th September 2009, 14:08
Can anyone recommend a class to do HTML parsing.
One of the key differences with XML appears to be HTMLs sloppy endtags. I have tried to subclass the QXmlDefaultHandler but it dies on missing endtags. Even when I continue after intercepting fatal errors normal event reporting is discontinued.
I have thought about an 'insert' function (to insert endtags on the fly elided by HTML) in a subclass of the XmlSimpleReader but that also appears a major job.
Any suggestions how to get a proper DOM document from a HTML source?
Enno
One of the key differences with XML appears to be HTMLs sloppy endtags. I have tried to subclass the QXmlDefaultHandler but it dies on missing endtags. Even when I continue after intercepting fatal errors normal event reporting is discontinued.
I have thought about an 'insert' function (to insert endtags on the fly elided by HTML) in a subclass of the XmlSimpleReader but that also appears a major job.
Any suggestions how to get a proper DOM document from a HTML source?
Enno