PDA

View Full Version : Mysterious QXMLStreamReader parsing behavior



ym1206
1st May 2008, 10:41
Hi,all,

I am new to Qt xml classes.
When reading XmlStreamReader example from the book by Jasmin Blanchette, I encountered mysterious QXMLStreamReader parsing behavior.
The small xml file from the book is listed as following:

<?xml version="1.0"?>
<bookindex>
<entry term="sidebearings">
<page>10</page>
<page>34-35</page>
<page>307-308</page>
</entry>
<entry term="subtraction">
<entry term="of pictures">
<page>115</page>
<page>244</page>
</entry>
<entry term="of vectors">
<page>9</page>
</entry>
</entry>
</bookindex>
My question is: Why do so many "Characters" Tokens emerge during the parsing procedure?
As an example, for line "<page>307-308</page>", after readElementText(),Token is "EndElement", with another readNext(), "</entry>" should be read, and the token is supposed to be "EndElement", but "Characters" token appeared, Why?

Thanks for your help.


All the tokens reported by the parser are listed below:
StartDocument
StartElement
Characters
StartElement
Characters
StartElement
EndElement
Characters
StartElement
EndElement
Characters
StartElement
EndElement
Characters
EndElement
Characters
StartElement
Characters
StartElement
Characters
StartElement
EndElement
Characters
StartElement
EndElement
Characters
EndElement
Characters
StartElement
Characters
StartElement
EndElement
Characters
EndElement
Characters
EndElement
Characters
EndElement
EndDocument

kunalnandi
1st May 2008, 11:53
Hello Friend,

I can't say exactly, but it might be a whitespace, i think the parser consider whitespace as "Characters".
so just check for whitespace before you read next element...

try this to check whitespace..!!



QXmlStreamReader *reader = new QXmlStreamReader();
if( reader->isWhitespace() == true )
{
reader->readNext();
}

ym1206
1st May 2008, 12:38
Thanks for your help.

I'll try to understand the example code with WhiteSpace and NewLine in mind.:D

lamle
14th June 2010, 20:56
Hi,
How about this xml document. Do you have any idea how to parse it using QXml StreamReader or QDomDocument??

<p>
There are two types of hydraulic tools for the connecting rod shank screws, see
<ref xml:link="simple" inline="true" behavior="external" content-role="fig" href="fig-400731-low">Fig 07-20</ref>
. The screws and nuts in the tool of
<hp0>new design</hp0>
(introduced in year 2001) should be replaced before reaching
<hp0>1000</hp0>
loading cycles, i.e raising the pressure to nominal value 1000 times).
</p>

Any help please. I have to read all of them.
I am using things like this:
QString content=reader.readElementText(QXmlStreamReader::I ncludeChildElements);
it will include everything inside <p> tag including those ref hp0. Then, I will go back to the beginning of <p> tag and readNext() to read the attribute and value <ref> and <hp0> tag.

while(!reader.isEndElement()||reader.name()!="p"){
if(reader.isStartElement()){
if(reader.name()=="ref"){
QString displaytext=reader.readElementText();
QXmlStreamAttributes attrs = reader.attributes();
QString role=attrs.value("content-role").toString();
QString href=attrs.value("href").toString();
QString link="<a href=\""+role+";"+href+"\">["+displaytext+"]</a>";
value.replace(displaytext,link);
}else if(reader.name()=="hp0"){
QString bold=reader.readElementText();
int bindex=value.indexOf(bold);
if(bindex!=-1&&value.indexOf("<b>"+bold+"</b>")==-1){
value.replace(bindex,bold.length(),"<b>"+bold+"</b>");
}
}else{
qDebug()<<"error. Undefined tag name inside <p>";
qDebug()<<reader.name();
}
}
reader.readNext();
}

HOWEVER, my big and funny problem is that with QXmlStreamReader we dont have the option to roll back to previous start tag. In this case when I use
QString content=reader.readElementText(QXmlStreamReader::I ncludeChildElements);
the current cursor is at the end of tab </p> and I can not roll back to the beginning of <p> tag to read <fig> and <hp0> tag
Any help please.