PDA

View Full Version : XQuery with HTML?



liuyanghejerry
4th June 2011, 05:50
Hi, I use the XQuery in Qt to read the HTML, the query is:

declare variable $inputDoc external;

doc($inputDoc)/tbody

When I use it for:

<tbody>
<xx>asd</xx>
</tbody>

It is valid, but when I use it for:

<tbody>
<tr class="epRowTwo">
<td colspan="2" class="c"><img src="/images/cds/137/main.png" alt="Main Image">
</td>
</tr>
<tr class="epRowOne">
<td class="b">Title</td>
<td>
<div style="width:300px;">
風のメッセージ/ このゆびとまれ (通常盤)
<br>
<div style="margin-left:2em;">
<b>English:</b> Message of the Wind / Follow Me (Regular Version)<br><b>Japanese (Romanized):</b> Kaze no Message / Kono Yubi Tomare (Regular Version)<br><b>Japanese (Trans):</b> Message of the Wind / Follow Me (Regular Version)<br>
</div>

</div>
</td>
</tr>
<tr class="epRowTwo">
<td class="b">Artist</td>
<td><div style="width:300px;">水橋舞 / あきよしふみえ (Mai Mizuhashi / Akiyoshi Fumie)</div></td>
</tr>
<tr class="epRowOne">
<td class="b">Catalog #</td>
<td>ZMCP-4082</td>
</tr>
<tr class="epRowTwo">
<td class="b">Release Date</td>
<td>2008-05-28</td>
</tr>
<tr class="epRowOne">
<td class="b">Language</td>
<td>Japanese</td>
</tr>
<tr class="epRowTwo">
<td class="b"># of Discs</td>
<td>1</td>
</tr>
<tr class="epRowOne">
<td class="b"># of Tracks</td>
<td>5</td>
</tr>
<tr class="epRowTwo">
<td class="b">Price/MSRP</td>
<td>1,365円</td>
</tr>
<tr class="epRowOne">
<td class="b">Run Time</td>
<td>20:06</td>
</tr>
<tr class="epRowTwo">
<td class="b">Your Rating</td>
<td>You must be logged in to rate.</td>
</tr>
<tr class="epRowOne">
<td class="b">Avg Rating</td>
<td><span class="cd-137">9.0000</span> (<span class="votescount-cd-">5</span>)</td>
</tr>
<tr class="epRowTwo">
<td class="b">Description</td>
<td>
<div style="width:300px;">
A Limited Edition CD+DVD version was also released on the same day.
</div>
</td>
</tr>
</tbody>


it is always invalid.

I just don't know why...

Is HTML a kind of XML?

wysota
4th June 2011, 06:22
It is invalid because your HTML is not a well formed XML, the img tags do not have closing tags.

Lykurg
4th June 2011, 07:12
... and the br tag, but you can load that html and do some simple QRegExp replacements for those tags. And with some luck you'll get a well formed XML document. There are surly libraries out there which does that job for you.