Hi,
I have a problem parsing some xml file from megaupload: my applycation, in short, extract links from a MegaUpload folder url. It works well but sometime I got error; but first of all the code:

Qt Code:
  1. void Dialog::parseXml()
  2. {
  3. muitems.clear();
  4. if(!internetJob->isRequestAborted() && internetJob->getStatusCode() == 200)
  5. {
  6. QByteArray data = internetJob->getData();
  7. qDebug() << "Checking for data!!\n";
  8. if(data.isEmpty()) {
  9. qDebug() << "Megaupload returned an empty result!\n";
  10. return;
  11. }
  12.  
  13. if( !doc.setContent( data ) ) {
  14. qDebug() << "The XML obtained from Megaupload is invalid.";
  15. return;
  16. }
  17.  
  18. QDomElement root = doc.documentElement();
  19. if( root.tagName() != "FILES" ) {
  20. qDebug() << "The xml file invalid.";
  21. return;
  22. }
  23.  
  24. QDomNode n = root.firstChild();
  25. while( !n.isNull() )
  26. {
  27. QDomElement e = n.toElement();
  28. if( !e.isNull() )
  29. {
  30. if( e.tagName() == "ROW" )
  31. {
  32. MegaUploadItem mui;
  33. mui.name = e.attribute( "name", "" );
  34. mui.url = e.attribute( "url", "" );
  35. muitems.append(mui);
  36. }
  37. }
  38. n = n.nextSibling();
  39. }
  40. }
  41. qSort(muitems);
  42. }
To copy to clipboard, switch view to plain text mode 

Most of the times the code above runs good and my QVector muitems is filled with the needed data (file name and url); some times I got an error at doc.setContent(data) even if I dont' know why; in fact, if I comment line 16 parseXml() continues and data are retrieved correctly; other time I got a error in the same line but it is parsed only the first sibling (hoping I was clear). Below are three different urls: with the first I got no error; whit the second doc.setContent(data) returns false but commenting line 16 the data is retrieved correctly; while the third url gives the error plus only one iteration of while(!n.isNull()) {...}.

http://www.megaupload.com/xml/folder...derid=D4HQHPLJ
http://www.megaupload.com/xml/folder...derid=0JY6SVP1
http://www.megaupload.com/xml/folder...derid=QEKO90W1

You can copy these urls in your browser and take a look at the generated xml file. They are structurally the same...

So my questions are:

1) Why doc.setContent(data) returns false? How get a more verbose output?
2) Why only the first tag is parsed and remaining not?

I was thinking to use QRegExp for parsing, but the captured test is wrong: here a sample row of the xml file:

<ROW name="VIDEO_TS.part06.rar" name_cut="VIDEO_TS.part06.rar" size="400 MB" url="http://www.megaupload.com/?d=3B1SORP1" downloadid = "3B1SORP1" sizeinbytes="419430400" expired="0"></ROW>
and below my pattern:

Qt Code:
  1. QRegExp pattern("<ROW\\s((\\w+)\\s*=\\s*(\"[^\"]\"))+></ROW>");
To copy to clipboard, switch view to plain text mode 

Probably the regular expression is wrong; who can review that?


Best regards.