PDA

View Full Version : reading a binary file full of floats and/or doubles



OzQTNoob
8th February 2012, 06:04
Okay so I have been trying to read in a binary file that is full of floats (little endian, single precision) and have managed to do that with QDataStream (see code below). The problem is though that these files are of the order of 531x50000 floats, and where each of the 531 represents a spectrum. If i loop over the whole file with 2 for loops (e.g. for t=0,530 and for i=0,49999) then it is really quite slow, the whole file is only 54MB. And as far as i can tell I can only read in one float at a time. Ideally I dont want to read the entire file (sometimes I might though), but I would prefer to read in an entire spectrum in one go e.g. all 531 float values at once rather than using nested loops. Can anyone point me in the right direction?


qint64 pos( 0);
float bob;
QVector<float> vect(531);
bool ok = 1;
.......

QFile file(fileName);
file.open(QIODevice::ReadOnly );
QDataStream stream( &file );
stream.setByteOrder(QDataStream::LittleEndian);
stream.setFloatingPointPrecision(QDataStream::Sing lePrecision);
stream.device()->reset();
for(int i=0;i<531;i++){
stream.device()->seek( pos );
stream >> bob;
vect[i] = bob;
pos +=4;
stream.device()->reset();
}


I also tried the above using readRawData but couldnt get it to work at all as the toFloat function always returns FALSE. I tried this to see if I could get it to work so I could then make my datain array have a size of 4x531 and just read in a whole spectrum in one go. What did I do wrong. I am a complete noob as well.



qint64 pos( 0);
QByteArray datain;
datain.resize(4);
float bob2;
bool ok = 1;

...
QFile file(fileName);
file.open(QIODevice::ReadOnly );
QDataStream stream( &file );
stream.setByteOrder(QDataStream::LittleEndian);
stream.setFloatingPointPrecision(QDataStream::Sing lePrecision);
stream.device()->reset();
for(int i=0;i<531;i++){
stream.device()->seek( pos );
stream.readRawData( datain.data(), datain.size() );
bob2 = datain.toFloat(&ok); <- THIS ALWAYS RETURNS A FALSE VALUE :(
pos +=4;
stream.device()->reset();
}



I should add that I have got the seek(pos) option in there because I was playing around to see what it was reading. If I read my data in from different parts of the file then I will have to use seek.
Cheers
Oz

myta212
8th February 2012, 10:00
Hi,
I am use standard C/C++ command to reading and writing binary file. I use this method to reading seismic unix file and other binary file. I think you can try this method, because more simple.

/* read binary float file, return as array of float */
float *readFloatBinaryFile(char *filename, int ndata)
{
FILE *infile=NULL;
float *dataread;
int idx;

/*open file */
infile = fopen(filename, "r");
if(!infile) /*show message and exit if fail opening file */
{
printf("error opening file %s \n", filename);
exit(0);
}

/*allocated memory/array for float data */
dataread = (float*) calloc(ndata, sizeof(float));

/*read file and save as float data */
fread(dataread, sizeof(float), ndata, infile);

/* close file */
fclose(infile);

return(dataread);
}

This code using C, but you can change to C++ . Example
in C :
dataread = (float*) calloc(ndata, sizeof(float));
in C++ :
dataread = new float(ndata);

I get this tutorial from here : http://toto-share.com/2011/11/cc-read-binary-file/
Thank you,

Best regards,

Myta

wysota
8th February 2012, 10:30
QDataStream is not a general purpose binary read/write stream. Don't use it unless you are sure you should be using it. Use QFile API instead.

OzQTNoob
8th February 2012, 23:49
Thanks for the replies guys/gals.

I will try and avoid the C/C++ route if I can as I want to keep everything Qt if I can. I will investigate the QFile option more thoroughly. As long as I can set my read position aka seek then I should be alright (QIODevice). I did go over my code last night and cleaned it up somewhat. Being the noob that I am I still need to get used to not having a bunch of really convenient data reading/writing tools as I had in IDL (Look for RSI ENVI if you are curious).

Cheers
Oz

ChrisW67
9th February 2012, 01:42
If your reading machine is little-endian (most are) then the problem should be trivial:


const int vectSize = 531;
const int blockSize = vectSize * sizeof(float);
QVector<float> vect(vectSize, 0.0f);

int desiredBlock = 0; // zero based, e.g 0..49999
QFile file("data");
if (file.open(QIODevice::ReadOnly)) {
if (file.seek(desiredBlock * blockSize)) {
qint64 bytes = file.read(reinterpret_cast<char*>(vect.data()), blockSize);
if (bytes != blockSize)
qFatal("Oops!");

qDebug() << vect;
}
}

Effectively just dropping the raw bytes into the internal buffer of the QVector.

You can use:


#if Q_BYTE_ORDER == Q_BIG_ENDIAN
// byte swapping code
#endif

at line 12 if you need portability.

wysota
9th February 2012, 01:51
Qt is C++, so everywhere you use Qt, C++ mechanisms will work as well.

As for convenient tools -- I don't know what conveniences you need to read four bytes from a file. Your current code is not very optimal, for instance performing seek in every iteration is a waste of time since the file pointer is already positioned in the right place.

The fastest way to read, say... 1000 floats is the following:


QFile file(...);
file.open(...);
file.seek(...);
float array[1000];
QByteArray ba = file.read(1000*sizeof(float)); // reading 4000 bytes is faster than reading 4 bytes 1000 times (even though the disk caches the data)
memcpy(&array, ba.constData(), ba.size());

Of course all that provided the interpretation of float on your machine is the same as the one in the file. Otherwise you'll need to iterate over the byte array and unmangle bytes properly.

OzQTNoob
9th February 2012, 03:56
Hi Chris and Wysota,

both methods worked a treat and taught me a little bit more about C++ along the way so thanks for that. All of my work and code is on a PC and will more than likely stay that way. Occasionally though I may get an image file (talking remote sensing image here e.g. spectral imagery) that has come from a BigEndian machine. In this case I can always identify it from its header file (An ascii file). If I am going to swap the endian of the data (assuming that happens) then I will have a look at methods for that (I am sure there are plenty of existing examples around).

In relation to the 2 code snippets you guys posted is their any reason to adopt one over the other? And as an aside, if I don't know the actual array sizes prior to runtime (they can actually vary a lot depending on the instrument used to collect the spectra) I assume that I can replace the expressions that have used the const terms etc with Qt Dynamic equivalents? e.g. use resize with the QVector and/or QByteArray. I will try this with the code snippets I got from you guys and see what happens :)

Cheers
Oz

ChrisW67
9th February 2012, 04:32
What is producing your spectra? ... or would you have to kill me I you told me ;)

OzQTNoob
9th February 2012, 04:47
Hi Chris,

no killing needed. The spectra can come from a number of sources such as an ASD field spectrometer (400-2500nm spectral region) or from airborne hyperspectral imagers such as HyMap (an Australian sensor) to name but 2. The datasets that we work with tend to come with an ENVI header file and almost always in the form of a spectral library or a multiband image file (has been adopted by a very large portion of the remote sensing community). Most of my work, although not all of it, is about getting at what minerals are present in a spectral dataset via the spectral absorption features in a given spectrum, and/or correcting the airborne imagery for atmospheric effects etc.

I have written lots of code in IDL to process the various datasets (I work for CSIRO) but the problem that I have with IDL is that it requires licensing and therefore we have to keep coming up money to fund our licences, and ultimately IDL is an interpreted language that defers back to C anyway, the for loops in IDL are painfully slow if you have to use them. Plus, with Qt it still lets me use UI's and is a slightly easier way for me to ease into C++ programming as well :)

ChrisW67
9th February 2012, 05:10
Cool. Only ask because there is much spectra processing in astronomy (one of my quals).

OzQTNoob
9th February 2012, 05:31
Indeed, in fact there are whole IDL libraries dedicated to astronomy software and routines, idlastro.gsfc.nasa.gov to name but one. Some of my best routines have been lifted from astro code :)
"To steal the work of one is plagiarism, to steal the work of many is research"