PDA

View Full Version : Is there a clear way to parse HTML in Qt 5.7



lmofallis
22nd December 2016, 00:10
Hi,

I would like a powerfull HTML parser working with Qt C++ (I'm now with Qt 5.7). I'm really tired of reading a lot of articles, but without finding a clair and recent Parcer.
I found libxml2 v2.9.4 but clear examples are rare. Also, I readed about QtWebKit but it's not supported with Qt 5.7 as I understand.

I'm an amateur programmer with VB.NET in that I can use the good "HTML Agility Pack".

What I want is a parser that:

working in windows and linux OS.
supporting at list HTML4 (HTML5 can be perfect).
don't need a control or a viewer to work.
having simple tutorials or examples.


I found also QXmlQuery, And I want to know if is it a good HTML parser.

Really, I'm tired of looking more.
Thank you.

anda_skoa
22nd December 2016, 13:21
XML parser are usually not viable because most Web content is unfortunately malformed and not valid XML.

Basically the only way to parse real life web content is a browser engine, because they have all sorts of work arounds for broken content.

In the case of Qt that would currently be QtWebEngine

User ayanda83 does quite a lot with it, check out his threads http://www.qtcentre.org/search.php?searchid=6994657

Cheers,
_

lmofallis
22nd December 2016, 15:31
Thank you Mr anda_skoa.

Almost all the time, I work with local HTML documents where I retrieve or I remove Tags or any other content.
So, rendering the HTML documents then processing with them is not a purpose for me.

ChrisW67
25th December 2016, 21:10
Have you considered various scripting languages like Python with Beautiful Soup? Might be a better choice for mass file manipulation.