PDA

View Full Version : QString static callback function from CURL



tpf80
10th May 2007, 20:55
I am using LibCURL to pull in data from various web sites. I have made the callback function work in the following way (which works properly but has horrible performance):


struct MemoryStruct {
char *memory;
size_t size;
};

static size_t WriteMemoryCallback(void *ptr, size_t size, size_t nmemb, void *data) {

size_t realsize = size * nmemb;
struct MemoryStruct *mem = (struct MemoryStruct *)data;

//increase the size of "memory" by the size of the bytes that we have read
mem->memory = (char *)realloc(mem->memory, mem->size + realsize + 1);

if (mem->memory) {


//copy to the end of the memory chunk, the piece of data that we read from curl (pointed to by ptr, size of realsize):
memcpy(&(mem->memory[mem->size]), ptr, realsize);

//update the size of the memory chunk:
mem->size += realsize;

//put a 0 at the end of the memory chunk
mem->memory[mem->size] = 0;
}
return realsize;
}


void FabwareMain::curl_init() {

//init the memory to hold what we gather with CURL
struct MemoryStruct chunk;
chunk.memory=NULL; /* we expect realloc(NULL, size) to work */
chunk.size = 0; /* no data at this point */

//init curl:
curl_global_init( CURL_GLOBAL_ALL ) ;

//create a curl handle:
CURL *curl_handle = curl_easy_init() ;
//the login page:


//set curl options:

curl_easy_setopt(curl_handle, CURLOPT_COOKIEFILE, "");

curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION, 1); //when redirected, follow the redirections
curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20060601 Firefox/2.0.0.3 (Ubuntu-edgy)");
curl_easy_setopt(curl_handle, CURLOPT_POST, 1);
curl_easy_setopt(curl_handle, CURLOPT_URL, "http://www.somewhere.com");
curl_easy_setopt(curl_handle, CURLOPT_POSTFIELDS, "");
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
//exec curl
curl_easy_perform(curl_handle);

//convert the memory chunk to a qstring

resData = QString::fromUtf8 (chunk.memory);
ui.textEdit->append (resData);

//parse the qstring



//clean up after ourselves:
curl_easy_cleanup(curl_handle);
curl_global_cleanup();

//clean up our memory chunk
if(chunk.memory) {
free(chunk.memory);
}

}




The problem with this method, is that although it works, it consumes a ton of memory and time due to copying the data many times. First CURL copies the data into a buffer, then the callback function copies it into the data chunk, then we convert it into a Qstring, then we copy that into the text edit box.

I have found that I can modify the callback function to put the data from CURL into a QString as it comes in, and parse out what I need on the fly. I can not however figure out how to make the data that is in the callback function QString be available outside of the function. I can't return the data as CURL calls the function, so I will need to store it in a way that can be accessed from the callback function and the rest of the Qt program.

Heres the current callback function that I am modifying:


struct MemoryStruct {
char *memory;
size_t size;
};

static size_t WriteMemoryCallback(void *ptr, size_t size, size_t nmemb, void *data) {

size_t realsize = size * nmemb;
struct MemoryStruct *mem = (struct MemoryStruct *)data;

//increase the size of "memory" by the size of the bytes that we have read
mem->memory = (char *)realloc(mem->memory, mem->size + realsize + 1);

if (mem->memory) {

QString qmemData = QString::fromUtf8 ((char *)ptr);
//parse out the stuff that I want
//How do I get this QString (qmemData) to be accessable outside of this function

}
return realsize;
}

How do I get the QString qmemData to be accessable outside of this function? Im sure its something simple I am missing somewhere.

marcel
10th May 2007, 21:13
I don't know if it fits your need, but what do you think of a static member in your main GUI class?

Actually it has to be a pointer QString *qmemData;

You can use it anywhere, modify it's contents, and possibly set a flag in your class that the string has been modified and update the widgets.

Regards

wysota
10th May 2007, 22:21
Using a QByteArray for the buffer would be a good thing too :)

BTW. Is the data writing here safe? You copy a pointer and not its contents. Who has the ownership of the pointer? The callback function or the caller? Because if the latter, then you may run into trouble.

tpf80
10th May 2007, 22:42
This piece of code tells the CURL library where to put what its gathering:

curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);
This piece of code tells it the callback function to use:

curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
Once you execute curl exec it does whatever the options tell it to do, and calls the callback function whenever the remote server sends back data. Im pretty sure its safe as copying the pointer is what the examples in the CURL install tarball show, and I haven't had any problems with it.

The main problem I face is that I am pulling in several megabytes of data, which normally would append to the data already in memory from the request. Rather than waste performance on pulling all of that data into memory I wanted to be able to look at the chunks as they come in, pull out what I want out of them, and pretty much throw away the rest.

Since the callback function is a static function, I am not sure how I can make the data that is processed in it available to my QT app outside of that function.

I did try the static member approach as marcel suggested, which compiled fine, however the program crashed when it got to the point that it started using that variable. Do you have an example of how this would work?

wysota
10th May 2007, 23:02
The main problem I face is that I am pulling in several megabytes of data, which normally would append to the data already in memory from the request. Rather than waste performance on pulling all of that data into memory I wanted to be able to look at the chunks as they come in, pull out what I want out of them, and pretty much throw away the rest.
So why do you use realloc for that? And is using curl really necessary? Do you use some specific options that QHttp can't handle?


Since the callback function is a static function, I am not sure how I can make the data that is processed in it available to my QT app outside of that function.
Through global variables or static members (in a singleton class).


I did try the static member approach as marcel suggested, which compiled fine, however the program crashed when it got to the point that it started using that variable. Do you have an example of how this would work?

Something like this should work:

class S {
public:
S *instance(){
static S* inst = new S;
return inst;
}
const char *getPtr() const { return S::staticPtr; }
static void callBack(){
//...
staticPtr = ...;
}
private:
static char *staticPtr;
};
char *S::staticPtr = 0;

patrik08
10th May 2007, 23:11
You dont need callback if you set a time limit! on curl!


If you need callback use http://doc.trolltech.com/4.0/qhttp.html

inside ..
http://sourceforge.net/projects/qtexcel-xslt/
you find more sample from curl ...





/* return file contenets as qstring from local or remote file ... like php*/
QString curl_get_contents( QString fullFileName )
{
QString inside = "";
qDebug() << "###1 curl_get_contents "<< fullFileName;
QString xdcookiefile, xml_export_file, wwwnetfile;
if (fullFileName.contains("https://", Qt::CaseInsensitive)) {
return inside;
}
Init();
qDebug() << "###2 curl_get_contents "<< fullFileName;

xdcookiefile = QString( "%1biscotti.html" ).arg( WORK_CACHEDIR );
xml_export_file = QString( "%1_xml_export.xml" ).arg( WORK_CACHEDIR );
wwwnetfile = QString( "%1wwwfull.html" ).arg( WORK_CACHEDIR );

if (IsNetFile( fullFileName ) )
{
qt_unlink(wwwnetfile);
qDebug() << "###3 curl_get_contents "<< fullFileName;
/*QString xgeturl = fullFileName; http://shop.ecoplanet.ch/info.php */
QByteArray lop = wwwnetfile.toAscii();
char *localfile = lop.data();
QByteArray der = xdcookiefile.toAscii();
char *xwcookiefile = der.data();
QByteArray ba = fullFileName.toAscii();
char *url = ba.data();
qDebug() << "### entra save to "<< localfile;
qDebug() << "### get url "<< url;
CURL *curl_handle;
FILE *outfile;
curl_global_init(CURL_GLOBAL_ALL);
curl_global_init(CURL_GLOBAL_ALL);
curl_handle = curl_easy_init();
outfile = fopen(localfile, "w");

if (outfile!=NULL) {
/* CURLOPT_COOKIE and must have CURLOPT_COOKIEJAR char * (file to write same as php) */
curl_easy_setopt(curl_handle, CURLOPT_URL, url);
curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, TRUE);
curl_easy_setopt(curl_handle, CURLOPT_COOKIEJAR, xwcookiefile );
curl_easy_setopt(curl_handle, CURLOPT_COOKIEFILE, xwcookiefile );
curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION , 1);
#if defined Q_WS_MAC
curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "Mac QT4 / PPK_W @ciz.ch" );
#endif
#if defined(Q_WS_WIN)
curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "Windows QT4 / PPK_W @ciz.ch" );
#endif
curl_easy_setopt(curl_handle, CURLOPT_MAXREDIRS , 2);
curl_easy_setopt(curl_handle, CURLOPT_TIMEOUT , 20 );
curl_easy_setopt(curl_handle, CURLOPT_FILE, outfile);

if (curl_easy_perform(curl_handle)==CURLE_OK) {
fclose(outfile);

if (is_file(wwwnetfile)) {
inside = ReadFile( wwwnetfile );
if (inside.size() > 1) {
ERROR_MSG = "";
#if defined Q_WORKS_PEND
/* not delete file */
#else
qt_unlink(wwwnetfile);
#endif
}
}

} else {
ERROR_MSG = "Error time out to get remote file";
}
}
/* return grab result from local file */
return inside;
}




/* ok is a fake normal local file init ..... */
inside = ReadFile( fullFileName );
return inside;
}

tpf80
11th May 2007, 00:14
So why do you use realloc for that? And is using curl really necessary? Do you use some specific options that QHttp can't handle?

I used that in the first place because thats what the example from CURL themselves used. For my purposes, since I don't need the whole data chunk that is pulled in, I've removed that part and use pretty much this for making the data within the callback usable:


resData = QString::fromUtf8 ((char *)ptr);

As for why I was using CURL instead of QHttp, It was mainly because I didn't see QHttp supporting HTTPS (at least in the free edition that I have at the moment) and I was already pretty familiar with CURL's functionality as a standalone program and using it in PHP. This is my first attempt at using it with C++ though.

My problem is probably related to my unfamiliarity to using a variable in the way I need to. The static member code that you posted looks very close to what I had, I'm taking a look at it and see what I was missing.

tpf80
11th May 2007, 00:25
Actually, I found out that I might not need https for what I am doing. I think I'll try out QHttp and see if that doesn't solve my problem, rather than making a mess of my program trying to force it to work with curl.

wysota
11th May 2007, 00:31
As for why I was using CURL instead of QHttp, It was mainly because I didn't see QHttp supporting HTTPS (at least in the free edition that I have at the moment)
Qt 4.3+ supports SSL sockets. The release candidate of Qt 4.3 is already available at TT site.

tpf80
11th May 2007, 00:42
I also notice that QHttp doesn't really have a good way to do cookies like curl does.
It seems like In order to keep my session, I would need to extract the cookie after each request, and then add it to the next one in a semi-manual fashion.

When I use curl, I just can issue a command which creates a "cookiejar" in memory when i first start it up and handles preserving the cookies itself.

So I guess I kindof have to decide what I want to do:

1) use CURL and figure out this static member variable stuff to get the data I want out (since I already have curl logging in, doing cookies for the session, etc perfectly already)

or..

2) rewrite code in QHTTP which would feel cleaner but would have to worry about a few more things like the cookies myself.

tpf80
11th May 2007, 01:02
You dont need callback if you set a time limit! on curl!


If you need callback use http://doc.trolltech.com/4.0/qhttp.html

inside ..
http://sourceforge.net/projects/qtexcel-xslt/
you find more sample from curl ...




In the code that you wrote it looks like curl is writing to a file. How would I use it without a callback function if I wanted to pull the data into memory only and not write anything to a file?

wysota
11th May 2007, 01:02
I also notice that QHttp doesn't really have a good way to do cookies like curl does.
It seems like In order to keep my session, I would need to extract the cookie after each request, and then add it to the next one in a semi-manual fashion.
Not exactly (I use a base request header object where I set the cookie, so that I don't have to do it on every request), but in general you're correct.


When I use curl, I just can issue a command which creates a "cookiejar" in memory when i first start it up and handles preserving the cookies itself.
Curl is definitely a higher level tool than QHttp so it has much much more capabilities (QHttp is really only an interface to the protocol), but on the other hand it'd be much easier to parse the result using QHttp.

tpf80
16th May 2007, 20:47
Thanks to all that replied, in the end I used libcurl for what I was doing with the HTTP stuff, and got it to work real well in fact.

What I ended up doing was using the std string/char functions in the callback function, so that it would take out what it needed on the fly and write it to the memory chunk. Since in the callback its using those, it made things easier and have much better performance.

In the main Qt program, I used the QString, by appending the returned data chunk via QString::fromUtf8. I could then do with it what I wanted to.

:)