jiveaxe
8th January 2009, 16:03
Hi,
I have a problem parsing some xml file from megaupload: my applycation, in short, extract links from a MegaUpload folder url. It works well but sometime I got error; but first of all the code:
void Dialog::parseXml()
{
muitems.clear();
if(!internetJob->isRequestAborted() && internetJob->getStatusCode() == 200)
{
QByteArray data = internetJob->getData();
qDebug() << "Checking for data!!\n";
if(data.isEmpty()) {
qDebug() << "Megaupload returned an empty result!\n";
return;
}
QDomDocument doc;
if( !doc.setContent( data ) ) {
qDebug() << "The XML obtained from Megaupload is invalid.";
return;
}
QDomElement root = doc.documentElement();
if( root.tagName() != "FILES" ) {
qDebug() << "The xml file invalid.";
return;
}
QDomNode n = root.firstChild();
while( !n.isNull() )
{
QDomElement e = n.toElement();
if( !e.isNull() )
{
if( e.tagName() == "ROW" )
{
MegaUploadItem mui;
mui.name = e.attribute( "name", "" );
mui.url = e.attribute( "url", "" );
muitems.append(mui);
}
}
n = n.nextSibling();
}
}
qSort(muitems);
}
Most of the times the code above runs good and my QVector muitems is filled with the needed data (file name and url); some times I got an error at doc.setContent(data) even if I dont' know why; in fact, if I comment line 16 parseXml() continues and data are retrieved correctly; other time I got a error in the same line but it is parsed only the first sibling (hoping I was clear). Below are three different urls: with the first I got no error; whit the second doc.setContent(data) returns false but commenting line 16 the data is retrieved correctly; while the third url gives the error plus only one iteration of while(!n.isNull()) {...}.
http://www.megaupload.com/xml/folderfiles.php?folderid=D4HQHPLJ
http://www.megaupload.com/xml/folderfiles.php?folderid=0JY6SVP1
http://www.megaupload.com/xml/folderfiles.php?folderid=QEKO90W1
You can copy these urls in your browser and take a look at the generated xml file. They are structurally the same...
So my questions are:
1) Why doc.setContent(data) returns false? How get a more verbose output?
2) Why only the first tag is parsed and remaining not?
I was thinking to use QRegExp for parsing, but the captured test is wrong: here a sample row of the xml file:
<ROW name="VIDEO_TS.part06.rar" name_cut="VIDEO_TS.part06.rar" size="400 MB" url="http://www.megaupload.com/?d=3B1SORP1" downloadid = "3B1SORP1" sizeinbytes="419430400" expired="0"></ROW>
and below my pattern:
QRegExp pattern("<ROW\\s((\\w+)\\s*=\\s*(\"[^\"]\"))+></ROW>");
Probably the regular expression is wrong; who can review that?
Best regards.
I have a problem parsing some xml file from megaupload: my applycation, in short, extract links from a MegaUpload folder url. It works well but sometime I got error; but first of all the code:
void Dialog::parseXml()
{
muitems.clear();
if(!internetJob->isRequestAborted() && internetJob->getStatusCode() == 200)
{
QByteArray data = internetJob->getData();
qDebug() << "Checking for data!!\n";
if(data.isEmpty()) {
qDebug() << "Megaupload returned an empty result!\n";
return;
}
QDomDocument doc;
if( !doc.setContent( data ) ) {
qDebug() << "The XML obtained from Megaupload is invalid.";
return;
}
QDomElement root = doc.documentElement();
if( root.tagName() != "FILES" ) {
qDebug() << "The xml file invalid.";
return;
}
QDomNode n = root.firstChild();
while( !n.isNull() )
{
QDomElement e = n.toElement();
if( !e.isNull() )
{
if( e.tagName() == "ROW" )
{
MegaUploadItem mui;
mui.name = e.attribute( "name", "" );
mui.url = e.attribute( "url", "" );
muitems.append(mui);
}
}
n = n.nextSibling();
}
}
qSort(muitems);
}
Most of the times the code above runs good and my QVector muitems is filled with the needed data (file name and url); some times I got an error at doc.setContent(data) even if I dont' know why; in fact, if I comment line 16 parseXml() continues and data are retrieved correctly; other time I got a error in the same line but it is parsed only the first sibling (hoping I was clear). Below are three different urls: with the first I got no error; whit the second doc.setContent(data) returns false but commenting line 16 the data is retrieved correctly; while the third url gives the error plus only one iteration of while(!n.isNull()) {...}.
http://www.megaupload.com/xml/folderfiles.php?folderid=D4HQHPLJ
http://www.megaupload.com/xml/folderfiles.php?folderid=0JY6SVP1
http://www.megaupload.com/xml/folderfiles.php?folderid=QEKO90W1
You can copy these urls in your browser and take a look at the generated xml file. They are structurally the same...
So my questions are:
1) Why doc.setContent(data) returns false? How get a more verbose output?
2) Why only the first tag is parsed and remaining not?
I was thinking to use QRegExp for parsing, but the captured test is wrong: here a sample row of the xml file:
<ROW name="VIDEO_TS.part06.rar" name_cut="VIDEO_TS.part06.rar" size="400 MB" url="http://www.megaupload.com/?d=3B1SORP1" downloadid = "3B1SORP1" sizeinbytes="419430400" expired="0"></ROW>
and below my pattern:
QRegExp pattern("<ROW\\s((\\w+)\\s*=\\s*(\"[^\"]\"))+></ROW>");
Probably the regular expression is wrong; who can review that?
Best regards.