PDA

View Full Version : [SOLVED] Piping to QProcess very slow performance



Phlucious
19th March 2013, 20:53
I have an external compression application that can run independently or piped via stdin. I would like to implement the piping option to take advantage of the performance benefits of avoiding writing to disk twice. My data is a binary stream of several MB, often a few GB.

I have implemented this using QProcess, launching the executable in QIODevice::ReadWrite mode with the appropriate arguments using QProcess::start(). My data is then output using a QDataStream attached to my QProcess. I don't have access to the outside application's source code, but other implementations have done this succesfully using libraries other than Qt.

Two things. First, no data actually appears to transfer until I have my application sit around waiting for the QProcess to finish. How do I force a flush of QProcess?

Second, when the actual data transfer between processes occurs, it is incredibly slow. The transfer rate appears to be around 400KB/s. When I compress a file in stand-alone mode, it takes approximately 1.369secs to run a 68MB file, so I know that the compression itself is relatively fast. The same file, when piped, takes approximately 170 seconds, and the CPU reports a 0% load on either process.

I suspect that the slow performance has something to do with the fact that QProcess is optimized for basic text communication between processes (very small buffers), not for huge data transfers, but I see no way to change the buffer size for QProcess so that I can test this theory.

I'm developing and running on Windows 7 64bit Pro, using MSVC2010.

jesse_mark
19th March 2013, 23:01
First, no data actually appears to transfer until I have my application sit around waiting for the QProcess to finish. How do I force a flush of QProcess?

do you use
readyRead ,readyReadStandardError, readyReadStandardOutput signals ??
this will be trigger every time the buffer is full or its flushed.

Phlucious
20th March 2013, 00:36
do you use
readyRead ,readyReadStandardError, readyReadStandardOutput signals ??
this will be trigger every time the buffer is full or its flushed.

I have no control over the compressor's source code, so I don't know how he retrieves from stdin. I'm guessing he waits for some sort of flush (which QFile has but QProcess does not), but I could be wrong.

Added after 1 14 minutes:

I've found a solution to my problem.

Rather than writing directly to the QProcess, I instead write to a QByteArray via a QBuffer and then manually flush the QBuffer into the QProcess at regular intervals. I was able to reduce my file's write speed from the awful 170 seconds I was seeing to approximately 4 seconds, only about 50% slower than running it as a stand-alone process.

It basically boils down to the fact that the buffer built into QProcess is designed for transmitting relatively short strings between processes, not large data blocks. The result was that the buffer got flushed WAY more often than it needed to.

In the end, my write speeds are about the same as I would have gotten writing to the disk first and running it in stand-alone mode after the fact, but at least I avoid the intermediate files.

Sample code:


/* set up compressor sub-process */
QProcess* compressor = new QProcess();
connect(compressor, SIGNAL(readyReadStandardError()), this, SLOT(retrieveStdErr()));
connect(compressor, SIGNAL(readyReadStandardOutput()), this, SLOT(retrieveStdOut()));

qDebug("Starting compressor...");
compressor->start("d:/tools/compressor.exe", args, QIODevice::ReadWrite);
if(!compressor->waitForStarted(3000))
{
qWarning("Could not start compressor. Aborting write.");
delete compressor;
return false;
}

/* QProcess doesn't have a big enough buffer - set an intermediate one */
QByteArray dataBuffer;
QBuffer zip_buffer;
zip_buffer.setBuffer(&dataBuffer);
zip_buffer.open(QIODevice::ReadWrite);
QDataStream io(&zip_buffer);

/* start processing */
int interval = 50000;
for(int i = 0; i < ThingList.size(); ++i)
{
const Thing& it = ThingList.at(i);
io << it;

/* flush the buffer after writing a lot of things */
if(i % interval == 0)
{
compressor->write(zip_buffer.data());
zip_buffer.seek(0);
}
}
compressor->write(zip_buffer.data());

/* give the compressor time to catch up */
if(!compressor->waitForFinished(5000))
{
qWarning("Too slow!);
return false;
}

compressor->deleteLater();
return true;

wysota
20th March 2013, 01:18
Second, when the actual data transfer between processes occurs, it is incredibly slow. The transfer rate appears to be around 400KB/s. When I compress a file in stand-alone mode, it takes approximately 1.369secs to run a 68MB file, so I know that the compression itself is relatively fast. The same file, when piped, takes approximately 170 seconds, and the CPU reports a 0% load on either process.

I suspect that the slow performance has something to do with the fact that QProcess is optimized for basic text communication between processes (very small buffers), not for huge data transfers, but I see no way to change the buffer size for QProcess so that I can test this theory.
I'd start by discarding QDataStream from the pipeline.

Phlucious
20th March 2013, 01:25
I'd start by discarding QDataStream from the pipeline.

I have a lot of respect for your opinion, so I'll bite. Why dump QDataStream? I elected to use QDataStream because it handles the conversion from BigEndian (my system) to LittleEndian (file format) for me.

Added after 4 minutes:

To add to my earlier post, the situation where QProcess isn't receiving anything until I wait for it seems to be related to returning to the event loop. No data appears to transfer to the sub-process until I either trigger qApp->processEvents() or QProcess::waitForFinished(). Why is that?

Repeatedly calling processEvents() slows down the write process and seems to run exactly contrary to the idea of aynchronous processing.

wysota
20th March 2013, 01:41
Why dump QDataStream? I elected to use QDataStream because it handles the conversion from BigEndian (my system) to LittleEndian (file format) for me.
Because it doesn't do what you probably think it does. It adds significant overhead to processing time. If you try to write a large blob of data in one go into a buffer that can't accept that much data, you get stuck.


To add to my earlier post, the situation where QProcess isn't receiving anything until I wait for it seems to be related to returning to the event loop. No data appears to transfer to the sub-process until I either trigger qApp->processEvents() or QProcess::waitForFinished(). Why is that?
Because Qt is an event driven framework. If writing would occur at once, you'd be blocked until all of it had completed.


Repeatedly calling processEvents() slows down the write process and seems to run exactly contrary to the idea of aynchronous processing.
So don't call processEvents() but rather rely on the bytesWritten() signal being emitted. If you keep pushing data into a pipe that is already full, you end up being blocked until the other end manages to make some room in the buffer. Your current solution is error prone as you can easily run out of memory.

Phlucious
20th March 2013, 02:09
Because it doesn't do what you probably think it does. It adds significant overhead to processing time. If you try to write a large blob of data in one go into a buffer that can't accept that much data, you get stuck.

Well, the QDataStream is used for passing tens of thousands of small records into a QBuffer, as shown in the sample code. I won't deny the possibility that it's adding to my overhead, but I haven't had time to implement the byte-swapping code needed to switch between Big/Little-Endian. Since QBuffer lives in my RAM, not in the pipe, doesn't that bypass that particular issue?


Because Qt is an event driven framework. If writing would occur at once, you'd be blocked until all of it had completed.

Makes sense, but...


So don't call processEvents() but rather rely on the bytesWritten() signal being emitted. If you keep pushing data into a pipe that is already full, you end up being blocked until the other end manages to make some room in the buffer. Your current solution is error prone as you can easily run out of memory.

Doesn't the pipe get drained by the receiving sub-process? If that's the case, then I don't understand why my main process would have to return to its event loop, even temporarily, to allow a separate process the opportunity to drain the pipe. Aren't they running in parallel?

I don't know if it's relevant, but I'm spawning this QProcess in a child thread, not the main GUI thread.

Added after 18 minutes:

I see. The fact that QProcess::write() returns a value does not mean that those bytes were actually written. I added a QProcess::waitForBytesWritten call immediately after the QProcess::write() call that added just a small bit of overhead. I guess that makes sense. Thanks again, wysota.

wysota
20th March 2013, 02:29
Well, the QDataStream is used for passing tens of thousands of small records into a QBuffer, as shown in the sample code. I won't deny the possibility that it's adding to my overhead, but I haven't had time to implement the byte-swapping code needed to switch between Big/Little-Endian.
At the same time it expands your data. If you write 1000 records, 10 bytes each, you end up writing more than 10000 bytes. QDataStream is a serialization mechanism and not a general purpose binary stream. If that's really what you're after then that's ok just bear in mind you won't be able to read that data back without QDataStream (e.g. using software not based on Qt).


Since QBuffer lives in my RAM, not in the pipe, doesn't that bypass that particular issue?
You then try to push the large blob into the pipe again.


Doesn't the pipe get drained by the receiving sub-process? If that's the case, then I don't understand why my main process would have to return to its event loop, even temporarily, to allow a separate process the opportunity to drain the pipe. Aren't they running in parallel?
It doesn't matter if they run in parallel or not. What matters is that calling write() doesn't actually write the data. The same happens with sockets.


I don't know if it's relevant, but I'm spawning this QProcess in a child thread, not the main GUI thread.
Completely not relevant.


The fact that QProcess::write() returns a value does not mean that those bytes were actually written.
Yes.


I added a QProcess::waitForBytesWritten call immediately after the QProcess::write() call that added just a small bit of overhead.

which doesn't solve the issue of running out of memory when trying to allocate a huge blob at once. You should divide your data into chunks and write it chunk by chunk to limit memory usage. Imagine what happens if you have 2GB of data to process on 32b machine. Your process would be killed instantly because you'd try to allocate at least 4GB of memory (2GB for "dataBuffer" and another 2GB for what gets into compressor->write).

Phlucious
20th March 2013, 16:36
At the same time it expands your data. If you write 1000 records, 10 bytes each, you end up writing more than 10000 bytes. QDataStream is a serialization mechanism and not a general purpose binary stream.

I guess I assumed that QDataStream is a general purpose binary stream. I'll have to re-evaluate that assumption. Now it's starting to sound more like an easy/lazy way to save custom data types to disk, which certainly isn't what I'm trying to accomplish. I'm trying to write a very specific file with very clearly defined platform-agnostic specifications.


What matters is that calling write() doesn't actually write the data. The same happens with sockets.

Easily the most useful thing I've learned today. It'd be nice if the documentation noted that write() queues the data to be written, not perform the actual write at the time. I suppose that's probably obvious to someone with a more formal CS background, though...


which doesn't solve the issue of running out of memory when trying to allocate a huge blob at once. You should divide your data into chunks and write it chunk by chunk to limit memory usage. Imagine what happens if you have 2GB of data to process on 32b machine. Your process would be killed instantly because you'd try to allocate at least 4GB of memory (2GB for "dataBuffer" and another 2GB for what gets into compressor->write).

Thanks to our discussion, this is what I consider the solution. I'm streaming into the 100kb buffer instead of the QProcess, but in smaller chunks than a massive 4GB blob. Every time I detect my buffer starting to get full, a custom flush() method pauses everything to send the buffer to the QProcess, waiting until bytesWritten() returns. Once the buffer is purged, writing resumes. That way I'm sure I'm not overloading the pipe and the short pause doesn't seem to noticably hurt performance.

wysota
20th March 2013, 18:31
I guess I assumed that QDataStream is a general purpose binary stream. I'll have to re-evaluate that assumption. Now it's starting to sound more like an easy/lazy way to save custom data types to disk, which certainly isn't what I'm trying to accomplish. I'm trying to write a very specific file with very clearly defined platform-agnostic specifications.
QDataStream will not help you much with that. If you're after changing endianness, you can use qToLittleEndian() and family.



It'd be nice if the documentation noted that write() queues the data to be written, not perform the actual write at the time.
It does say that:


Certain subclasses of QIODevice, such as QTcpSocket and QProcess, are asynchronous. This means that I/O functions such as write() or read() always return immediately, while communication with the device itself may happen when control goes back to the event loop.

Phlucious
20th March 2013, 18:45
If you're after changing endianness, you can use qToLittleEndian() and family.

Awesome. I completely forgot about that family of functions. Thanks again!