What is the best way to use QDataStream or QFile [Archive]

lni

29th September 2011, 08:50

Hi,

I have a large binary file consisting 3000x3000x2000 floating data points (around 67G in size). For each interval of 2000, I need to read one float value. So in total I need to read 3000x3000 points. What is the fastest way to read it?

I did the following way and it takes a decade to read it out.

QFile file( ... );
QDataStream stream( &file );

qint64 pos( 0);
char data[ 4 ];
for ( int ix = 0; ix < 3000; ix++ ) {
for ( int iy = 0; iy < 3000; iy++ ) {
stream.device()->seek( pos );
stream.readRawData( data, 4 );
...
pos += 2000;
}
}

ugluk

29th September 2011, 09:02

Well, if the Qt built-in approaches don't work, try an external library (like boost), that supports memory-mapped files and try that. I've read that sometimes, reading goes faster that way.

cincirin

29th September 2011, 09:15

Well, if the Qt built-in approaches don't work, try an external library (like boost), that supports memory-mapped files and try that. I've read that sometimes, reading goes faster that way.

Qt framework already have memory mapping file (http://doc.qt.nokia.com/latest/qfile.html#map) support.

wysota

29th September 2011, 10:22

I somehow doubt he'll be able to map 67 gigabytes to memory. I'd say that reading 9 million floats (which gives at least 36 megabytes of data) just has to take time, especially if one reads it value at a time. A trivial optimization is to read all values at once and then iterate over them.

Ordering of data in the file is poor as well, skipping 2kB of data (why 2 and not 8? 2k of 4 byte values is 8kB) before each read significantly limits the disk cache hit ratio, even for a 32MB cache it will give you lots of cache misses. Nothing will change that, even mapping the file to memory (unless you have enough physical RAM to actually be able to map the whole file at once). If you can't change the file structure then I suggest you invest in a faster disk (SSD?) or more RAM (96GB should do).

lni

1st October 2011, 05:22

I somehow doubt he'll be able to map 67 gigabytes to memory. I'd say that reading 9 million floats (which gives at least 36 megabytes of data) just has to take time, especially if one reads it value at a time. A trivial optimization is to read all values at once and then iterate over them.

Ordering of data in the file is poor as well, skipping 2kB of data (why 2 and not 8? 2k of 4 byte values is 8kB) before each read significantly limits the disk cache hit ratio, even for a 32MB cache it will give you lots of cache misses. Nothing will change that, even mapping the file to memory (unless you have enough physical RAM to actually be able to map the whole file at once). If you can't change the file structure then I suggest you invest in a faster disk (SSD?) or more RAM (96GB should do).

You are absolutely right. It should be "pos += 8000".

I am restructuring the data so it will read data block by block to maximize the disk cache. What is the best block size I should use? Is there a way to query this parameter in the program?

Many thanks!

ChrisW67

1st October 2011, 06:42

If you are restructuring the file so that each 3000x3000 block is contiguous in the file then you want to read 9000000 * sizeof(float) bytes at a time. Provided you can allocate the required 36000000 byte (or bigger) block of memory to receive the result, just ask the operating system to read it and it will do it as fast as is possible. If you break the read into 3000 reads of 3000 floats, or something else, then you add overhead for little gain.