Is there a clear way to parse HTML in Qt 5.7
Hi,
I would like a powerfull HTML parser working with Qt C++ (I'm now with Qt 5.7). I'm really tired of reading a lot of articles, but without finding a clair and recent Parcer.
I found libxml2 v2.9.4 but clear examples are rare. Also, I readed about QtWebKit but it's not supported with Qt 5.7 as I understand.
I'm an amateur programmer with VB.NET in that I can use the good "HTML Agility Pack".
What I want is a parser that:
- working in windows and linux OS.
- supporting at list HTML4 (HTML5 can be perfect).
- don't need a control or a viewer to work.
- having simple tutorials or examples.
I found also QXmlQuery, And I want to know if is it a good HTML parser.
Really, I'm tired of looking more.
Thank you.
Re: Is there a clear way to parse HTML in Qt 5.7
XML parser are usually not viable because most Web content is unfortunately malformed and not valid XML.
Basically the only way to parse real life web content is a browser engine, because they have all sorts of work arounds for broken content.
In the case of Qt that would currently be QtWebEngine
User ayanda83 does quite a lot with it, check out his threads http://www.qtcentre.org/search.php?searchid=6994657
Cheers,
_
Re: Is there a clear way to parse HTML in Qt 5.7
Thank you Mr anda_skoa.
Almost all the time, I work with local HTML documents where I retrieve or I remove Tags or any other content.
So, rendering the HTML documents then processing with them is not a purpose for me.
Re: Is there a clear way to parse HTML in Qt 5.7
Have you considered various scripting languages like Python with Beautiful Soup? Might be a better choice for mass file manipulation.