PDA

View Full Version : Unwanted downloads with QWebPage/QWebPage



ouekah
10th May 2010, 15:47
Hi,

I use two QWebPage objects to download two webpages and the loadFinished(bool) signal of these pages is connected to one slot pageDownloaded(bool).

But it appears that one of these pages (not always the same) is downloaded twice ! Here is my code:




void main() {
QWebPage page1;
QWebPage page2;
connect(&page1, SIGNAL(loadFinished(bool)), this, SLOT(pageDownloaded(bool)));
connect(&page2, SIGNAL(loadFinished(bool)), this, SLOT(pageDownloaded(bool)));
QString url1 = "http://www.google.com";
QString url2 = "http://www.yahoo.com";
page1.mainFrame()->load(QUrl(url1));
qWarning() << "downloading page 1";
page2.mainFrame()->load(QUrl(url1));
qWarning() << "downloading page 2";
}
...
void pageDownloaded(bool ok) {
qWarning() << "page downloaded";
}


does anybody know how to avoid that situation ?

JD2000
10th May 2010, 18:08
But it appears that one of these pages (not always the same) is downloaded twice

Sorry, I'm not clear about what exactly is happening-

Are you getting two copies of the same page, both pages plus an additional copy of one of them, both pages but 3 'page downloaded' messages or what?

squidge
10th May 2010, 18:31
Well, both pages are using url1, so it's natural to accept that the same page will be downloaded twice...

ouekah
10th May 2010, 18:45
Sorry wrong copy/paste... no it's url2...

what happens exactly is that pageDownloaded(bool) is executed more than two times which for me means that a page has been downloaded more than once.

squidge
10th May 2010, 20:42
Maybe it's working as normal. For example, what happens if you request www.google.com, and the website decides to redirect you to a different server to handle that request? Then you would download 2 web pages, a redirect and the actual page.

You should change your code so you can tell the pages apart, and then display the number of bytes downloaded by each page. If one of the pages show two different sizes, then this is probably whats happening.

ouekah
11th May 2010, 13:42
I need to parse web pages to extract some data and I would like to use QtXml to do that. My initial idea was to use a QNetworkReply in conjunction with QDomDocument::setContent.

But unfortunately html pages are not necessarily well formed xml documents, so this doesn't work.

So I used QWebPage instead because it provides functionality to parse html page directly. But now I have the following problem: when I download a webpage (QWebPage::mainFrame()->load(QUrl())), the QWebPage::loadFinished() signal may be emitted several times which generates errors in my application. Frankly speaking I don't know what happens, but I don't think that it's redirection (I've been on this pages and their URL does not change).

Now I think that I should maybe rather use a simple QNetworkManager to do what I want.

But is there a way to convert text in html format into text in xml format using Qt ? This should hopefully solve my problem.

Talei
11th May 2010, 22:48
Maybe this is not the best help You cant get, but first: why don't qDebug() << incomingdata; ? Dump the data in each reply and see for Yourself what's going on.
Alternatively, if you don't want rewrite/write code, use sniffing tool, i.e. Wireshark, don't download anything (that way less output in Wireshark hence nicer work) and run sniffer/Your program and see What Really is going on.