PDA

View Full Version : Calculate MD5 sum of a big file



realdarkman71
25th February 2012, 18:27
Hi all,

I need a md5 sum of a file with a big size. I found this:


QFile file("/home/chris/backup.img");

if (file.open(QIODevice::ReadOnly)) {
QByteArray fileData = file.readAll();
QByteArray hashData = QCryptographicHash::hash(fileData, QCryptographicHash::Md5);

qDebug() << hashData.toHex();
}


but with a file size > 8 GB it's a memory overkill!

Are there any other methods?

Thanks!
Chris

Talei
25th February 2012, 19:41
QCryptographicHash::hash() is static "helper" function, and what You are looking for is QCryptographicHash::addData().

This was answered here i.e. http://www.qtcentre.org/threads/35674-QCryptographicHash-an-entire-file

realdarkman71
25th February 2012, 22:18
That works, thanks for the hint!

Is there a way to calculate md5 sum from a dvd (/dev/cdrom) with this method? It musts read the dvd raw data, not files on disc!

ChrisW67
25th February 2012, 22:28
Open and read the /dev/cdrom device directly: you get the bytes from one end of the disc to the other.

realdarkman71
25th February 2012, 22:35
How? QFile doesn't work!

ChrisW67
26th February 2012, 23:02
Sure it does, but QIODevice::atEnd() is not reliable on block special files so you have to adopt a slightly different approach:


QFile in("/dev/cdrom");
if (in.open(QIODevice::ReadOnly)) {
qDebug() << "Opened ok";
char buf[2048];
int bytesRead;
while ((bytesRead = in.read(buf, 2048)) > 0) {
qDebug() << "Read" << bytesRead;
}
qDebug() << "Final bytesRead value" << bytesRead;
in.close();
}

This will fail to open if there is no disc in the drive. In a real program you would use a larger buffer, distinguish between bytesRead == 0 and -1 etc.

realdarkman71
27th February 2012, 11:23
Thanks for the hint, but it doesn't work for me! The md5 sum from image and dvd are not the same! On command line they are identical! Or is the code wrong?



QCryptographicHash hash(QCryptographicHash::Md5);
QFile in("/dev/cdrom");

if (in.open(QIODevice::ReadOnly)) {
char buf[2048];

while (in.read(buf, 2048) > 0) {
hash.addData(buf, 2048);
}

in.close();
qDebug() << hash.result().toHex();
}
else {
qDebug() << "Failed to open device!";
}

wysota
27th February 2012, 12:05
How many bytes did you manage to read from the device?

realdarkman71
27th February 2012, 12:32
...the while loop reads all data from dvd!?

wysota
27th February 2012, 12:37
...the while loop reads all data from dvd!?

Was this supposed to be an answer to my question? If so, then it's not what I expected. I expected that you will change your code to count how many bytes were really read, not that you expect the loop to read "all" the data.

realdarkman71
27th February 2012, 12:42
Ok, but how? ...sorry!

wysota
27th February 2012, 12:51
Read the docs to learn what QFile::read() returns. Then use your newly acquired knowledge to count how many bytes in total were read from the file.

realdarkman71
27th February 2012, 13:15
Ok, I used this:



QCryptographicHash hash(QCryptographicHash::Md5);
QFile in("/dev/cdrom");

if (in.open(QIODevice::ReadOnly)) {
char buf[2048];
int bytesRead;
qint64 overallBytesRead = 0;

while ((bytesRead = in.read(buf, 2048)) > 0) {
overallBytesRead += bytesRead;
hash.addData(buf, 2048);
}

in.close();
qDebug() << "overall bytes read:" << overallBytesRead;
qDebug() << hash.result().toHex();
}
else {
qDebug() << "Failed to open device!";
}


After completed, overallBytesRead is "8738865152", but the file size of image is "8738846720". He reads more bytes as on dvd is! I'm confused! :confused:

Lesiok
27th February 2012, 14:12
Line 11 should be :
hash.addData(buf, bytesRead);

ChrisW67
27th February 2012, 22:49
This is not a failing of Qt, but a failing of understanding the data you are handling. The DVD/CD will almost always contain extra padding data at the end of the supplied image to meet requirements of the specifications. You should read (up to) as many bytes from the DVD as are in the image you are trying to compare with. So, for example, a Gentoo image:


// Original image file
chrisw@newton ~ $ dd if=install-amd64-minimal-20110609.iso bs=2048 | md5sum
64900+0 records in // <<< this is the number of blocks in the image
64900+0 records out
132915200 bytes (133 MB) copied, 4.39815 s, 30.2 MB/s
3acf53667fcf1d03e98068ee4af5f4a3 -

// Reading all data from the raw device... fails
chrisw@newton ~ $ dd if=/dev/cdrom bs=2048 | md5sum
64963+0 records in
64963+0 records out
b0700288a316b71dee09ed87dce3b160 - // <<<< Not good
133044224 bytes (133 MB) copied, 39.3402 s, 3.4 MB/s

// reading correct number of blocks from the device matches
chrisw@newton ~ $ dd if=/dev/cdrom bs=2048 count=64900 | md5sum
64900+0 records in
64900+0 records out
3acf53667fcf1d03e98068ee4af5f4a3 - // <<< Sweet :)
132915200 bytes (133 MB) copied, 37.9759 s, 3.5 MB/s

realdarkman71
28th February 2012, 09:14
Okay, I understand! But how can I do this in my code? This works for me:


QCryptographicHash hash(QCryptographicHash::Md5);
QFile in("/dev/cdrom");
QFileInfo fileInfo("/home/chris/backup.img");
qint64 imageSize = fileInfo.size();

if (in.open(QIODevice::ReadOnly)) {
char buf[2048];
int bytesRead;
qint64 overallBytesRead = 0;

while ((bytesRead = in.read(buf, 2048)) > 0) {
overallBytesRead += bytesRead;
hash.addData(buf, 2048);

if (overallBytesRead == imageSize) {
break;
}
}

in.close();
qDebug() << "overall bytes read:" << overallBytesRead;
qDebug() << hash.result().toHex();
}
else {
qDebug() << "Failed to open device!";
}

Is that right?

ChrisW67
28th February 2012, 22:36
This works because the image will be an exact number of 2048 byte blocks. If you change that buffer size then you might need to handle reading a last partial block in the read at line 11 and line 13. For example if you used at 10000 byte buffer with my Gentoo image of 132,915,200 bytes the last block will be 5200 bytes and you do not want to read more than that or your hash will be affected.

I'd be inclined to do it this way:


QCryptographicHash hash(QCryptographicHash::Md5);
QFile in("/dev/cdrom");
QFileInfo fileInfo("/home/chris/backup.img");
qint64 imageSize = fileInfo.size();

const int bufferSize = 10000;
if (in.open(QIODevice::ReadOnly)) {
char buf[bufferSize];
int bytesRead;

int readSize = qMin(imageSize, bufferSize);
while (readSize > 0 && (bytesRead = in.read(buf, readSize)) > 0) {
imageSize -= bytesRead;
hash.addData(buf, bytesRead);
readSize = qMin(imageSize, bufferSize);
}

in.close();
qDebug() << hash.result().toHex();
}
else {
qDebug() << "Failed to open device!";
}


There is always more than one way to do it.

wysota
28th February 2012, 23:18
The smartest thing would probably be to detect the end of the image data instead of relying on having the image size given upfront. Padding probably begins with some fixed pattern or there is some ioctl call that can return the real data size. Otherwise it wouldn't be possible to create the image in the first place.

ChrisW67
29th February 2012, 00:55
In the case of my Gentoo CDROM the extra 63 blocks on the CD are all zero. Unfortunately, so are at least the last 100 blocks of the original image. The end of my VirtualBox additions CD image and a data DVD is similar. I don't claim to know the complete in-and-outs of the various standards but it does seem that zero padding is done at several stages in the mastering and writing of an image (-pad option in mkisofs for example).

If the point of the exercise is to compare an image to the version on a disc then knowing the image size up front is hardly unreasonable.

realdarkman71
29th February 2012, 12:08
Okay, I know the image size at this point. I read "image size" / 2048 blocks from DVD in a "for" loop. That works, check sums are identical!

Thank you very much to all for your help!!!:D

realdarkman71
29th February 2012, 22:41
Sorry, but another problem: With Windows I cannot open the dvd drive!


QFile dvd("D:");

if (dvd.open(QIODevice::ReadOnly)) {
(...)
}


dvd.open returns false, as admin too! Any ideas?

ChrisW67
1st March 2012, 01:19
Devices are not files on Windows. The entire conversation to date has been about Linux/UNIX systems so nobody has pointed this out. You may need to use Windows API calls to do this, e.g. http://www.codeproject.com/Articles/15725/Tutorial-on-reading-Audio-CDs

From that article it may be worth trying QFile dvd("\\\\.\\D:") as the file name first... but I wouldn't hold much hope.

realdarkman71
1st March 2012, 19:44
QFile dvd("\\\\.\\D:") doesn't work ... pity!