PDA

View Full Version : Increasing QProcess read performance



mcostalba
3rd December 2006, 20:38
Hi all,

I need to read a big amount of data (more then 30MB) from an external application and process it. The problem is that the external application execution time is:

$ time data_producer >> /dev/null
3.27user 0.12system 0:03.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15239minor)pagefaults 0swaps

While, reading with QProcess with the slot connected to readyReadStdout() signal empty, so to have maximum theoretical speed:

void DataLoader::on_readyReadStdout() { }

The time is more then doubled to almost 7s.


The real WORKING slot used is actually:

void DataLoader::on_readyReadStdout() {

// we use a circular buffer to store data chunks from loading process
QByteArray* b = new QByteArray(proc.readStdout()); // copy c'tor uses shallow copy
buffersRing.insert(buffersRingHead, b);

if (++buffersRingHead == BUF_RING_SIZE)
buffersRingHead = 0;
}

Where:
QPtrVector<QByteArray> buffersRing;

Then, on an indipendent timer I process the buffers. The circular buffer storing is quite fast, about 5% of total time, and very difficult to make faster IMHO.

So my question is, there is a faster way to load data, as example tweaking the read pipe? In this case how I have to do?

Thanks
Marco

P.S: I have Qt 3.3.6

jacek
3rd December 2006, 21:00
How long does this take?

$ time data_producer >> some_file

mcostalba
3rd December 2006, 21:21
$ time data_producer >> tmp.txt
3.34user 0.64system 0:03.99elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15238minor)pagefaults 0swaps

BTW HD write activity is done by Linux _after_ execution ends. So data_producer really writes to memory, indeed to OS disk write cache.

wysota
3rd December 2006, 21:31
Who said an empty on_readyReadStdout() is faster than one actually reading the data? Did you consider that if you leave on_readyReadStdout() empty the pipe buffer between two processes gets bigger and bigger. I don't know how the OS behaves when having 30MB of data in a single buffer. I wouldn't be surprised if it blocked the producer until some receiver reads something from the buffer allowing more data to go in.

Try using QProcess::setCommunication(0) to get rid of the pipe at all and see if the performance increases.

jacek
3rd December 2006, 22:58
BTW HD write activity is done by Linux _after_ execution ends. So data_producer really writes to memory, indeed to OS disk write cache.
But IMO it's still better than measuring how fast OS can discard data.

When your application is processing the data, it probably blocks event loop for a while. Maybe you could process the data in smaller chunks or invoke QApplication::processEvents() from time to time to read new data and free the buffer?

In Qt4 there's QProcess::setupChildProcess(), but I couldn't find anything similar for Qt3. Maybe you could tweak the output buffer in data producer (if it's possible, since I've never tried to do this)?

mcostalba
4th December 2006, 06:32
Ok. With receiver slot like:

void DataLoader::on_readyReadStdout() {

proc.readStdout();
}

I have about 6.5s, not a biggie.

The real receiver is:

void DataLoader::on_readyReadStdout() {

// we use a circular buffer to store data chunks from loading process
QByteArray* b = new QByteArray(proc.readStdout()); // copy c'tor uses shallow copy
buffersRing.insert(buffersRingHead, b);

if (++buffersRingHead == BUF_RING_SIZE)
buffersRingHead = 0;
}

But I really don't know how to make it faster. Could you spot some hidden copies on above code?

Thanks
Marco

mcostalba
4th December 2006, 06:52
I have tweaked the code to process only a limited amount of data each call, instead of all the available data, so to not stall the pipe, but the results are worst!!! About half second more.

I think this is due to higher context switches numbers between two processes, reciever and producer. It seems it is better, one switched the context, to stay there a little bit more then less.

wysota
4th December 2006, 08:45
What kind of data to you receive? If it's text data you can process the incoming data only if a newline is received.

What exactly is buffersRing? How long does it take to insert an item there?

jacek
4th December 2006, 14:47
I have tweaked the code to process only a limited amount of data each call, instead of all the available data, so to not stall the pipe, but the results are worst!!!
How long does it take if you first read all of the data and then process it?

mcostalba
4th December 2006, 20:12
Here I am again. These are the answers you asked for:

1) QPtrVector<QByteArray> buffersRing;

2) It's text data formed by records, each record is delimited by \0, so I have first to find the delimiting \0 with the data still in QByteArray format, then I can convert the chunk to QString:

// find next chunk, about 5% of total time
endIdx = (startIdx < (int)ba.size()) ? ba.find(0, startIdx) : -1;

// deep copy here by fromAscii(), about 4% of total time
const QString& chunk(QString::fromAscii(&(ba[startIdx]), endIdx - startIdx));


Where ba it's one of the buffers pointed by buffersRing. I do this outside of fast path, on a timer set to 500ms. There some logic, here omitted, to handle half lines / half records and so on.


3) Times get only slightly better if I first read all of the data and then process it, but the GUI is blank until the end of loading, so the _feeling_ from the user is of more slowness.


Now what I have found:

QProcess::readyReadStdout() is emitted at furious paces, always less then 10ms each, very often less then 5ms. And the data read in proc.readStdout() is very small 5-10KB, so you can imagine how many calls are needed to read 30MB of stuff!

So I have disconnected the signal from the slot and used a timer set to 50ms to manually call the on_readyReadStdout() slot. Now data chunks are quite bigger, about 100-200KB and frequency is less. Total time got better by about 5%, though it's still much higher then when data_producer writes directly to a file.

Setting the timer to a bigger interval 100-200ms does not give better results.

I can perform other test if you want.

Thanks
Marco