PDA

View Full Version : cURL & QWebKit don't mix



Frankenstein Coder
19th July 2011, 05:30
Hello, I have a small problem with Qt and cURL: it seems as though the Qt Web parsing library only works with certain sites and its very frustrating because I can't figure out why some sites work and some don't. Here's my code (slightly altered to shorten):


#include <iostream>
#include <string>
#include <QApplication>
#include <QtWebkit>
#include <QDomAttr>
#define CURL_STATICLIB
#include "curl/curl.h"

using namespace std;
class Filter : public QString
{
public:
QString content_;
static size_t handle(char * data, size_t size, size_t nmemb, void * p);
size_t handle_impl(char * data, size_t size, size_t nmemb);
};

size_t Filter::handle(char * data, size_t size, size_t nmemb, void * p)
{
return static_cast<Filter*>(p)->handle_impl(data, size, nmemb);
}

size_t Filter::handle_impl(char* data, size_t size, size_t nmemb)
{
content_.append(data);
return size * nmemb;
}
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
CURL *curl;
CURLcode res;
QWebPage page;
QWebFrame* frame = page.mainFrame();
curl = curl_easy_init();
if(curl) {

Filter f;
curl_easy_setopt(curl, CURLOPT_URL, "http://www.teamliquid.net");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &Filter::handle);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &f);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION,1);
res = curl_easy_perform(curl);
frame->setHtml(f.content_);
QWebElement document = frame->documentElement();
QWebElementCollection elements = frame->findAllElements("a");
int i = elements.count();

curl_easy_cleanup(curl);
}
return 0;
}


int i never has the correct amount of links. It always returns some random number in the millions.

This only happens for certain sites like the one in the program, google and yahoo, and other big sites work fine, so I'm really baffled.

EDIT: I wasn't sure if this is supposed to be in newbie or not, sorry in advanced if wrong section.

Frankenstein Coder
20th July 2011, 03:03
Sorry for such a quick reply but I've discovered a few new things and have some things to clarify.
1. the int i actually isn't returning random numbers in the millions: it only does that if I look at the value with a breakpoint in Visual Studio, it's real value is 0 when printed (which is still wrong).

2. Sites that aren't being correctly parsed in Qt seem to have this at the start of their HTML doc:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">


sites with:


<!DOCTYPE html>

OR


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">


Seem to work 100% fine.

I tried a work around by removing:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
With Qtstring.remove, but it didn't work.