PDA

View Full Version : Multithreaded per pixel operations on QImage



N¤X
11th September 2009, 23:47
Hi, I'm writing a small program which does per pixel operations on an QImage. Because of its use to handle large images I want several threads to work on the image. Therefore I handle n parts of the image with QtConcurrent, which sadly doesn't work properly.
As a first test I just try to convert the image to grayscale, but instead of gray the Image gets black, just a few Pixels (ordered in a characteristic pattern :eek:) really get gray. I add an result image under the post...
Here is the code I wrote:


// imgOrig is QImage* which holds the original image
// imgDone is QImage* which should hold the result image

// function which is called
void Scan::convertToGrayscale() {
// get best number of threads
int n = QThreadPool::globalInstance()->maxThreadCount();
// calculate part width (each thread will handle a part of the image)
int step = imgOrig->width() / n;
for (int i=0; i<n; i++) {
if (i+1 < n) {
QtConcurrent::run(this, &Scan::convertToGrayscale, i*step, (i+1)*step);
}
else {
// the last part should really reach the end (int a/b*b != a)
QtConcurrent::run(this, &Scan::convertToGrayscale, i*step, imgOrig->width());
}
}
QThreadPool::globalInstance()->waitForDone();
}

// function to handle a part of the image
void Scan::convertToGrayscale(int from, int to) {
int temp;
for (int i=from; i<to; i++) {
for (int j=0; j<imgOrig->height(); j++) {
temp = qGray(imgOrig->pixel(i, j));
imgDone->setPixel(i, j, qRgb(temp, temp, temp));
}
}
}

The Screenshot was scaled down to 25% in width and heigth, but you can still see the strange paterns. The System I runned the program on has two cores, as you can see the image really is split into two halfes with similar patern. Is this misbehaviour because both threads write into the same QImage or something?

PS: Qt 4.5 under Windows XP SP3, don't know if this matters.

wysota
12th September 2009, 10:05
I'd use QtConcurrent::blockingMappedReduced(). First divide the image into chunks, then map the chunks to grayscale chunks and then reduce the chunks into a single image again.

BTW. What is the format (and especially the depth) of your image?

N¤X
12th September 2009, 11:22
thx for the quick answer :)

The format of the images is simple QImage::Format_RGB32 (The images should come from a scanner or digital camera)

How should I devide the image to chunks? Should I create smaller images and copy the chunks into them and then back into the original or how is that meant? I don't know if this is fast enough, because I intend to run multiple operations on the image, and for some i can only devide horizontally and some only vertically (for more complex operations i need to process a whole h/v line of pixels). I don't want to physically devide the image multiple times during one run if this is possible :/
But maybe I just get you wrong or there is an easy and fast way to do so, if yes, please be a little more specific and I'll try it out :)

N¤X
13th September 2009, 13:59
sry for the double post, but the edit button vanished Oo
I found a solution, the "setPixel" function isn't atomic, so a mutex fixed it -_-

mutex.lock();
imgDone->setPixel(i, j, qRgb(temp, temp, temp));
mutex.unlock();


But now there is another problem: the mutexed setPixel slows the function down massively, even more than single threaded, but it doesn't seem to be because of the mutex. Here a few examples:

// multithreaded, ~11sec runtime on two cores
void Scan::convertToGrayscale() {
[...]
QtConcurrent::run(this, &Scan::convertToGrayscale, i*step, (i+1)*step);
[...]
}

void Scan::convertToGrayscale(int from, int to) {
[...]
mutex.lock();
imgDone->setPixel(i, j, qRgb(temp, temp, temp));
mutex.unlock();
}
}
}

// not threaded, ~2sec runtime Oo
void Scan::convertToGrayscale() {
[...]
//QtConcurrent::run(this, &Scan::convertToGrayscale, i*step, (i+1)*step);
convertToGrayscale(i*step, (i+1)*step);
[...]
}

void Scan::convertToGrayscale(int from, int to) {
[...]
mutex.lock();
imgDone->setPixel(i, j, qRgb(temp, temp, temp));
mutex.unlock();
}
}
}
OK, was the same code, it seems the mutex is the problem and setPixel seems fast, but lets validate this assumption. According to this assumption the runtime still should be long if I replace the setPixel but keep the mutex:

// multithreaded with mutex but without setPixel, ~2sec
void Scan::convertToGrayscale() {
[...]
QtConcurrent::run(this, &Scan::convertToGrayscale, i*step, (i+1)*step);
[...]
}

void Scan::convertToGrayscale(int from, int to) {
[...]
mutex.lock();
temp = 5;
//imgDone->setPixel(i, j, qRgb(temp, temp, temp));
mutex.unlock();
}
}
}
The mutex, which seemed to be the problem, doesn't seem to have any impact here, but the setPixel, which seemed to be no problem seems now to be the problem. I don't get it :/

Has anybody experiences with multithreading and imagefiles? I wan't to speed it up because later operations are really time consuming, but if the write process gets slower with multiple threads the whole optimization doesn't make sense. -_-

wysota
13th September 2009, 15:25
setPixel is slow, use QImage::scanLine() instead. You won't need to protect it with a mutex.

N¤X
13th September 2009, 18:15
Thx for the clue. I just implemented it and ... the same patterns again! And again a mutex solves the pattern problem but slows the whole thing down :crying:
This works now, and with ~7sec runtime its faster then with setPixel, but still slower than without multithreading (~2sec) -_-

void Scan::convertToGrayscale(int from, int to) {
int temp;
for (int i=from; i<to; i++) {
for (int j=0; j<imgOrig->height(); j++) {
temp = qGray(imgOrig->pixel(i, j));
mutex.lock();
//imgDone->setPixel(i, j, qRgb(temp, temp, temp));
*((QRgb*) imgDone->scanLine(j)+i) = qRgb(temp, temp, temp);
mutex.unlock();
}
}
}
I'll replace the pixel() function with scanline too, but the general problem of being slower with multiple threads than with only one still exists :/ Any Ideas about that? ^^

wysota
14th September 2009, 08:29
This works for me quite well:

#include <QtGui>


struct ImageRef {
QRgb *data;
int width;
};

void convertToGS(const ImageRef &ref){
for(int i=0;i<ref.width;i++){
int gray = qGray(ref.data[i]);
ref.data[i] = qRgb(gray, gray, gray);
}
}

int main(int argc, char **argv){
if(argc<2) return 1;
QApplication app(argc, argv);
QImage img(argv[1]);
QList<ImageRef> refs;
for(int i=0;i<img.height();i++){
ImageRef ref;
ref.data = (QRgb*)img.scanLine(i);
ref.width = img.width();
refs << ref;
}
QtConcurrent::blockingMap(refs, convertToGS);
QLabel l;
l.setPixmap(QPixmap::fromImage(img));
l.show();
return app.exec();
}

Multithreaded version takes 7ms as opposed to singlethreaded version that takes 12ms to convert a 1600x1200 image to grayscale.

N¤X
14th September 2009, 11:46
Thank you, this really looks nice and clean! I will use this in the programm :)
But there still remains one problem: The slowest function of my programm mainly consists of a 2-pass algorithm that runs on scanlines, in one pass on horizontal ones and in the other pass on vertical ones. Due to the structure of QImages the direct accessible scanlines are all horizontal, so I can only use your solution in one pass.
I'll try if it still works if you don't overwrite the picture data but paint the result in a new one (the write-to-another-picture-operation was the problem from the beginning -_-). Then i could write the result of the first pass transposed into another Picture and just run the same operations over the new picture to get the final result... I'll write when I tried this! :)

wysota
14th September 2009, 13:29
Thank you, this really looks nice and clean! I will use this in the programm :)
But there still remains one problem: The slowest function of my programm mainly consists of a 2-pass algorithm that runs on scanlines, in one pass on horizontal ones and in the other pass on vertical ones. Due to the structure of QImages the direct accessible scanlines are all horizontal, so I can only use your solution in one pass.
No, that's not true. After the horizontal pass transpose the image and use scanlines again. Then transpose it back. You'll get improved cpu cache performance as well (thanks to Ariya for making me aware of that) and as a result it will probably be faster than operating on the picture directly.