PDA

View Full Version : How do I work with HTML that is not valid XML



gryz
19th February 2014, 03:18
Hi all,

Sorry if this was already answered, I wasn't able to find good solution after few days of research.

How do I find some elements in HTML loaded as QString, taking into account that HTML is not valid xml.
I.e. some of its tags do not have corresponding closing tags:




<style>
pre.debug {
white-space: pre-wrap;
width: 90%;
overflow: hidden;
}
</style>
<link href="//fonts.googleapis.com/css?family=Open+Sans:300,400&lang=en" rel="stylesheet" type="text/css">
<style>
.banner {
text-align: center;
}
</style>


In the example above QDomDocument::elementsByTagName() fails to return me <style> element that follows <link> element.
I assume this is because <link> isn't closed properly.

How do I address this with smallest effort?

Thanks a lot in advance!

anda_skoa
19th February 2014, 10:17
You could try loading it into a QWebPage and use its API to access the internal DOM structure.

Cheers,
_