1 Attachment(s)
reading XML node and sub nodes
Hi all,
I need to parse an XML file. When a specific node, is found, I need to get the exact test inside that tag, not only the "xml text".
Example:
Code:
<root>
<a>11111</a>
<a><b>test 123</b></a>
</root>
In this XML I need to be able to get: "1111" and "<b>test 123</b>".
I am using this code in Qt5:
Code:
void test1() {
"<root>"
" <a>11111</a>"
" <a><b>test 123</b></a>"
"</root>";
QXmlStreamReader xml(rawXML);
QStringRef s;
xml.readNextStartElement();
s = xml.name();
if (s!="root") {
return;
}
while (!xml.atEnd()) {
xml.readNextStartElement();
s = xml.name();
if (s!="a") {
break;
}
QString ss
= xml.
readElementText(QXmlStreamReader
::IncludeChildElements);
qDebug("%s", qPrintable(ss));
}
}
This is not working as I expect. I am getting "test 123" and not "<b>test 123</b>".
What is the best approach to handle this situation? How can I parse the XML and getting the desired result?
Attachment 10558
EDIT: similar stackoverflow question: http://stackoverflow.com/questions/5...ing-html-in-qt
Re: reading XML node and sub nodes
Take a look at the tokenString() function. You may be able to construct the result as you go through the start, text, and end tokens of the <b> element.
Re: reading XML node and sub nodes
Alternatively use QDomDocument for parsing, QDomDocument::elementsByTagName to get all the <a> tags, then call QDomNode::save() on all their children.
Cheers,
_
Re: reading XML node and sub nodes
Another solution that does not work:
Code:
bytes.setBuffer(&ba);
QXmlStreamReader xml(&bytes);
...
pstart = device->pos();
QString ss
= xml.
readElementText(QXmlStreamReader
::IncludeChildElements);
pend = device->pos();
char line[100];
int len = pend-pstart;
device->seek(pstart);
device->read(line, len);
device->seek(pend);
This is because it seems that seems that QXmlStreamReader will buffer my whole data in advance. .. so "pos()" will always return the last byte in the raw data.
Re: reading XML node and sub nodes
2nd attempt, using QDomDocument:
Code:
void test2() {
const char* rawXML =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
"<root>"
" <a>11111</a>"
" <a><b>test 123</b></a>"
"</root>";
xml.setContent(ba);
qDebug("root element has %d childs", rootElement.childNodes().count());
qDebug("root element is %s ", qPrintable(rootElement.nodeName()) );
for (int i=0; i< a.length(); i ++) {
a.at(i).save(ts,0);
qDebug("- [%s]", qPrintable(aContent));
}
}
Now, I get not only the content, but the tags as well. This is the output:
Code:
root element has 2 childs
root element is root
- [<a>11111</a>
]
- [<a>
<b>test 123</b>
</a>
]
I assume I can trim the leading and ending , but this is at the same level of ugliness I am trying to avoid.
Any other idea?
EDIT:
I should be testing more before testing. anda_skoa, I re-read your post and changed:
Code:
const char* rawXML = "<root><a>11111</a><a><b>test 123</b></a></root>";
for (int i=0; i< alist.length(); i ++) {
a.firstChild().save(ts,0);
qDebug("- [%s]", qPrintable(aContent));
}
Which is better:
Quote:
- [11111]
- [<b>test 123</b>
]
Still, I get an extra newline after then which is bad for me... but its much better then before. It seems that QDomDocument is adding extra newlines when parsing. Am I correct? How can I disable this "feature"?
Re: reading XML node and sub nodes
Does the newline matter? it is still the same XML content, no?
One other thing you could look into is XQuery, Qt has a module for that as well (called Qt XML patterns)
Cheers,
_
Re: reading XML node and sub nodes
I modified the XML source to be in a single line (see last example). The newline is important, but since it can be part of the input (I think it needs to be escaped, so it may be possible to filter out). Still bugs me as it feels like Qt is doing extra work behind my back.
Yes, XmlPatterns is another option... I was looking into it, but as I know nothing about it, I was hoping someone would give me the one-liner I am looking for... :)
Re: reading XML node and sub nodes
Hmm, I am not sure the newline at this point (after an end tag) is relevant as far as XML goes.
Cheers,
_
Re: reading XML node and sub nodes
Quote:
Originally Posted by
anda_skoa
Hmm, I am not sure the newline at this point (after an end tag) is relevant as far as XML goes.
Cheers,
_
In theory you are right.
In practice, I am paring XMLs that contain data which (once) was a user input, and thus I need to store the exact that the user wrote.
Note to self:
If you ever get into problems ... UUENCODE or base64 that &^%&^% text.... unfortunately this time the format is pre-defined for me.