PDA

View Full Version : Strange error: doc.setContent(data) returns false



jiveaxe
8th January 2009, 17:03
Hi,
I have a problem parsing some xml file from megaupload: my applycation, in short, extract links from a MegaUpload folder url. It works well but sometime I got error; but first of all the code:


void Dialog::parseXml()
{
muitems.clear();
if(!internetJob->isRequestAborted() && internetJob->getStatusCode() == 200)
{
QByteArray data = internetJob->getData();
qDebug() << "Checking for data!!\n";
if(data.isEmpty()) {
qDebug() << "Megaupload returned an empty result!\n";
return;
}

QDomDocument doc;
if( !doc.setContent( data ) ) {
qDebug() << "The XML obtained from Megaupload is invalid.";
return;
}

QDomElement root = doc.documentElement();
if( root.tagName() != "FILES" ) {
qDebug() << "The xml file invalid.";
return;
}

QDomNode n = root.firstChild();
while( !n.isNull() )
{
QDomElement e = n.toElement();
if( !e.isNull() )
{
if( e.tagName() == "ROW" )
{
MegaUploadItem mui;
mui.name = e.attribute( "name", "" );
mui.url = e.attribute( "url", "" );
muitems.append(mui);
}
}
n = n.nextSibling();
}
}
qSort(muitems);
}

Most of the times the code above runs good and my QVector muitems is filled with the needed data (file name and url); some times I got an error at doc.setContent(data) even if I dont' know why; in fact, if I comment line 16 parseXml() continues and data are retrieved correctly; other time I got a error in the same line but it is parsed only the first sibling (hoping I was clear). Below are three different urls: with the first I got no error; whit the second doc.setContent(data) returns false but commenting line 16 the data is retrieved correctly; while the third url gives the error plus only one iteration of while(!n.isNull()) {...}.

http://www.megaupload.com/xml/folderfiles.php?folderid=D4HQHPLJ
http://www.megaupload.com/xml/folderfiles.php?folderid=0JY6SVP1
http://www.megaupload.com/xml/folderfiles.php?folderid=QEKO90W1

You can copy these urls in your browser and take a look at the generated xml file. They are structurally the same...

So my questions are:

1) Why doc.setContent(data) returns false? How get a more verbose output?
2) Why only the first tag is parsed and remaining not?

I was thinking to use QRegExp for parsing, but the captured test is wrong: here a sample row of the xml file:


<ROW name="VIDEO_TS.part06.rar" name_cut="VIDEO_TS.part06.rar" size="400 MB" url="http://www.megaupload.com/?d=3B1SORP1" downloadid = "3B1SORP1" sizeinbytes="419430400" expired="0"></ROW>

and below my pattern:


QRegExp pattern("<ROW\\s((\\w+)\\s*=\\s*(\"[^\"]\"))+></ROW>");

Probably the regular expression is wrong; who can review that?


Best regards.

init2null
8th January 2009, 19:07
Hello Giuseppe,

I'm not going to comment on the problem you're having with XML, since I haven't used it enough to be knowledgeable in that area. I can comment on your regexp, however.

To start with, using regular expressions for XML is not a good idea. They are far too brittle. If you choose to use one anyway, here is your corrected expression. I use a free program called Regex Coach (http://www.weitz.de/regex-coach/) to proofread mine. I didn't test this in Qt, but it should work.


QRegExp pattern("<ROW\\s+((\\w+)\\s*=\\s*(\"[^\"]*\")\s*)+></ROW>");

jiveaxe
8th January 2009, 22:27
To start with, using regular expressions for XML is not a good idea. They are far too brittle. If you choose to use one anyway, here is your corrected expression. I use a free program called Regex Coach (http://www.weitz.de/regex-coach/) to proofread mine. I didn't test this in Qt, but it should work.


QRegExp pattern("<ROW\\s+((\\w+)\\s*=\\s*(\"[^\"]*\")\s*)+></ROW>");

I have tested your pattern but it seems not working; it matches only the last couple of attributeName/attributeValue.

I have used your pattern in the following piece of code:



QTextStream ts(&data);

while (!ts.atEnd()) {
QString line = ts.readLine();
pattern.indexIn(line);
int max = pattern.numCaptures() - 1;
QStringList caps = pattern.capturedTexts();
for ( int i = 1; i < max; i += 2 )
{
QString attributeName = caps.at(i);
QString attributeValue = caps.at(i + 1);

// do something with that data now
qDebug() << attributeName + "=" + attributeValue + "\n";
}
}


Thanks

init2null
9th January 2009, 02:38
I guess regexps can only capture one value for each set of parenthesis. Since that's the case, either put in this snippet for each key-value pair
(\\w+)\\s*=\\s*(\"[^\"]*\")\s*
or parse the XML. I looked a little at your initial questions, and I can answer your question on getting the XML error message. Use the optional arguments for the setContent method:

bool QDomDocument::setContent ( const QString & text, QString * errorMsg = 0, int * errorLine = 0, int * errorColumn = 0 )