PDA

View Full Version : qUncompress data from gzip



Talei
20th April 2010, 13:20
Hello,
I struggle with this almost a month, with break of course, and can't figure out way to decompress gzip data from http.
I use fallowing code on QNetworkAccessManager SIGNAL( finished(QNetworkReply *) :


const QByteArray ba = reply->readAll();
QByteArray dataPlusSize;
const unsigned size = ba.size();

dataPlusSize.prepend( ((size >> 24) & 0xFF));
dataPlusSize.prepend( ((size >> 16) & 0xFF));
dataPlusSize.prepend( ((size >> 8) & 0xFF));
dataPlusSize.prepend( ((size >> 0) & 0xFF));

dataPlusSize.prepend(ba);

dataPlusSize = qUncompress(dataPlusSize);
qDebug() << dataPlusSize;
and output is as allways, when data is wrong, "qUncompress: Z_DATA_ERROR: Input data is corrupted".
So my question's do readAll() actually return ONLY gzip data or some additional chunk of data? Or maybe I do something wrong? (checked with wireshark and data is indeed gziped, tested with qCompress and everything works fine, so only conclusion is that readAll() return something more)

Best regards

Talei
21st April 2010, 05:50
After some more reading, I found out that qzip saves uncompressed data size in last 4 bytes (hence 4GB limitation to gzip max files, and prepending size by qUncompres that expect that information in the first 4 bytes not the last), and that the data from readAll() all indeed gzip stream.
So my only problem, it seams, is BigEndian. I don't know, or actually don't quite get from doc, if only size needs to be in BigEndian or entire compressed data ?
My code so far:

const char dat[40] = {
0x1F, 0x8B, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0xAA, 0x2E, 0x2E, 0x49, 0x2C, 0x29,
0x2D, 0xB6, 0x4A, 0x4B, 0xCC, 0x29, 0x4E, 0xAD, 0x05, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x03, 0x00,
0x2A, 0x63, 0x18, 0xC5, 0x0E, 0x00, 0x00, 0x00
};

//dat is eq to string => {status:false}
//gzip last 4 bytes uncompressed lenght, 0E = 14, with is correct
QByteArray data;
data.fromRawData( dat, 40);

QByteArray uncomp = qUncompress( data );
qDebug() << uncomp;
Tried also changing last 4 bytes and placing them in the beginning without positive result.(I did also change order of 0x0E to first place in 32bit word and to last place)
So if there is kind soul that could point me my mistake I would be more then appreciate.
Best regards

JohannesMunk
25th April 2010, 16:48
Hi Talei!

You are using fromRawData wrong. It's a static member, that returns a QByteArray.



#include <QtCore>
#include <QtGui>

int main(int argc, char *argv[])
{
QApplication a(argc, argv);

QString test = "{status:false}";
QByteArray ba = qCompress(test.toUtf8());
qDebug() << "compressed: " << ba.toHex();
qDebug() << "uncompressed: " << qUncompress(ba);

const char dat[40] = {
0x1F, 0x8B, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0xAA, 0x2E, 0x2E, 0x49, 0x2C, 0x29,
0x2D, 0xB6, 0x4A, 0x4B, 0xCC, 0x29, 0x4E, 0xAD, 0x05, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x03, 0x00,
0x2A, 0x63, 0x18, 0xC5, 0x0E, 0x00, 0x00, 0x00
};

//dat is eq to string => {status:false}
//gzip last 4 bytes uncompressed lenght, 0E = 14, with is correct
QByteArray data = QByteArray::fromRawData(dat, 40);
qDebug() << "faulty data: " << data.toHex();
QByteArray uncomp = qUncompress( data );
qDebug() << uncomp.toHex();

return 0;
}


Have you read: http://doc.qt.nokia.com/4.6/qbytearray.html#qUncompress ?

It states that you need just to prepend the size in big endian. Nothing else. Correct me if I'm wrong, but the prepend snippet you provided prepends the ba at the end. So effictively the size is appended in "most signficant byte last" order..

Could you provide your dat array just the way your read it from file?

Johannes

Talei
26th April 2010, 04:35
Thank you very much for the input.
So first thing firs: my second snippet indeed don't prepend size at the beginning, but I did that that way also:


//gzip stream, with header and trailer
static const char dat[40] = {
0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0xaa, 0x2e, 0x2e, 0x49, 0x2c, 0x29,
0x2d, 0xb6, 0x4a, 0x4b, 0xcc, 0x29, 0x4e, 0xad, 0x05, 0x00, 0x00, 0x00, 0xff, 0xff, 0x03, 0x00,
0x2a, 0x63, 0x18, 0xc5, 0x0e, 0x00, 0x00, 0x00
};
unsigned int size = 14; //expected uncompresed size, reconstruct it BigEndianes, last 4bytes 0x0e, 0x00, 0x00, 0x00 = 0x0e = 14

QByteArray dataPlusSize; //empty array, add uncompresed size at the begining
//BigEndian order
dataPlusSize.append( (unsigned int)((size >> 24) & 0xFF));
dataPlusSize.append( (unsigned int)((size >> 16) & 0xFF));
dataPlusSize.append( (unsigned int)((size >> 8) & 0xFF));
dataPlusSize.append( (unsigned int)((size >> 0) & 0xFF));

dataPlusSize.append( data, data.size() );
QByteArray uncomp = qUncompress( dataPlusSize );
qDebug() << uncomp;


qDebug() print me size, and data as:

//data
"1f8b0800000000000003aa2e2e492c292db64a4bcc294ead05 000000ffff03002a6318c50e000000" data size: 40
//dataPlusSize, this I want to qUncompress
"0000000e1f8b0800000000000003aa2e2e492c292db64a4bcc 294ead05000000ffff03002a6318c50e000000" dataPlusSize size: 44

dat[] is gzip stream captured with Wireshark and indeed it's gzip stream (when I save it to file i.e. data.gzip, either zip/rar can open/decompress that data, and wireshark also).
It consist with: 10Byte header, deflate payload, 12byte trailer (8Byte CRC32 + 4 byte uncompressed size).
For gzip header consists with (first element, in my numeration, at pos. 1 not 0):
* Header size: 10 bytes
* First byte : ID1 = 0x1F
* Secound byte: ID2 = 0x8B
* Third byte: CM - compression method: 1-7 reserved - 0x08 == DEFLATE
bytes 4-10 extra flags, like file name, comments, CRC16, etc.. In my array bytes 4-9 are 0, and 10 is OS (operating system) == 0x03 with is UNIX (that is also true, www that I got this data is *nix machine.)
Then GZIP DEFLATE peyload (actual DEFLATE stream)
After Stream is 8Byte CRC32, and last 4 Byte is UNCOMPRESSED SIZE of DEFLATE stream.
Information source: RFC1952, RFC2616, RFC1951, zlib web page.
I don't know why qUncompress want size at the beginning, but normally, according to the standard, it should be last 4 bytes not the first one, hence the "magic" wit prepending size.
I debugged above code, and gave me:
inflate.c, it seams that inflate allocate correctly stream, but, inflate.c line 596:

if ((state->wrap & 2) && hold == 0x8b1f) { /* gzip header *...}// == false why?
and then, inflate.c lines 606 to 613 :

if (!(state->wrap & 1) || /* check if zlib header allowed */
#else
if (
#endif
((BITS(8) << 8) + (hold >> 8)) % 31) { //why this true?
strm->msg = (char *)"incorrect header check"; //<- here error
state->mode = BAD; //and BAD and break
break;
}
and miserable:

qUncompress: Z_DATA_ERROR: Input data is corrupted
It seams that error is either in inflateInit() or inflateInit2().
But when I decompress this myself, like this:

#include "zlib.h"
QByteArray gzipHttpDec::gzipDecompress( QByteArray compressData )
{
//decompress GZIP data

//strip header and trailer
compressData.remove(0, 10);
compressData.chop(12);

const int buffersize = 16384;
quint8 buffer[buffersize];

z_stream cmpr_stream;
cmpr_stream.next_in = (unsigned char *)compressData.data();
cmpr_stream.avail_in = compressData.size();
cmpr_stream.total_in = 0;

cmpr_stream.next_out = buffer;
cmpr_stream.avail_out = buffersize;
cmpr_stream.total_out = 0;

cmpr_stream.zalloc = Z_NULL;
cmpr_stream.zalloc = Z_NULL;

if( inflateInit2(&cmpr_stream, -8 ) != Z_OK) {
qDebug() << "cmpr_stream error!";
}

QByteArray uncompressed;
do {
int status = inflate( &cmpr_stream, Z_SYNC_FLUSH );

if(status == Z_OK || status == Z_STREAM_END) {
uncompressed.append(QByteArray::fromRawData((char *)buffer, buffersize - cmpr_stream.avail_out));
cmpr_stream.next_out = buffer;
cmpr_stream.avail_out = buffersize;
} else {
inflateEnd(&cmpr_stream);
}

if(status == Z_STREAM_END) {
inflateEnd(&cmpr_stream);
break;
}

}while(cmpr_stream.avail_out == 0);

return uncompressed;
}
data is correctly decoded (not only the one posted, dat[40], but ANY data that comes from WWW with gzip, tested and works fine). Above function is not perfect, because it don't check if stream is valid, but it is only a draft, but works with above dat[] array.

So to sum it up,I don't have slightest idea why qUncompress don't decompress it. I saw in inflate.c that they use inflateInit2(), and my data Indeed pass there to Inflateinit2, but the error occurs.
I don't really want to write my own decompresser but atm. it works, so if someone know where my mistake is please point them out.
Best regards.
EDIT: Here is my other post, due to lack of response here http://stackoverflow.com/questions/2690328/qt-quncompress-gzip-data.
EDIT2: I assume that bigEndian should be only SIZE not the DATA itself, maybe that's the mistake?

JohannesMunk
26th April 2010, 10:06
Hi!

Maybe qUncompress doesn't like that zlib version? The header seems different between your zlib stream and the one produced by qCompress. If you debug qUncompress, are there any branches for different headers?

Good luck,

Johannes

Doru
3rd December 2010, 13:48
Hi,
Did you succed in unCompressing gzip data with qUncompress?
Can you give me some hint?

I have the same problem when trying to decode swf file header...

Thanks

Talei
5th December 2010, 01:50
No. I wrote my own function to do that, see last code snippet.