PDA

View Full Version : QtConcurrent - everything seems ok, but no imporvement on execution time



dlib
8th April 2012, 20:22
Hi all, I am facing problems when I try to parallelize some parts of my code using QtConcurrent. I am writting an optimization routine, and parallelization occurs in several places. First, I parallelize random generation of points and function evaluation at these points. Then, I parallelize calls to optimization algorithm, let's call it Algo() method, in a mainOptim() routine. Finally, I parallelize operations needed for a function evaluation at one given point. I can sum up with pseudo-C++-code my problem's architecture this way

void mainOptim()
{
QList<Point> points_ = randomPoint(); // create M random points and their function evaluations

// I parallelize this N calls to Algo
for (int i=0;i<N;i++)
{
initPt = points_[i].point;
Algo();
}

}

QList<Point> randomPoint()
{
QList<Point> points = new QList<Point>;
for (int i=0;i<M;i++)
{
for(int j=0;j<ndim;j++)
{
Point.point[j] = rand();
}
Point.eval = funcEval(Point.point);
points.append(Point);
}
return points;
}

double funcEval(double* point)
{
double d;
/*
some parallelized calls to an interpreter here
*/
return d;
}

funcEval is called many times successively in Algo(), so that both mainOptim and randomPoint are using 'nested' parallelization.

Now that I have set the stage, I have a lot of unanswered questions I would like to add:

1/ USING QtConcurrent for non-QObject classe:
is QtConcurrent only working when the class 'launching' parallelized work is QObject?

2/ HOW could presence or absence of a QTime object have some impact on parallelization success?
I wrapped parallelized code on evalFunc() method with the commands QTime t.start() and t.elapsed() to account for the time spent in parallelized code. I then commented the lines concerning QTime object, and it seems that parallelization is working only when QTime linked commands are present. This can all sound preposterous, but QTime is related to QCoreApplication, and I suspect presence of QTime object operates some refresh operation in threads or the like. Is it possible?

3/ WHAT could hinder parallelization? Is nested parallelization forbidden?

4/ IS nested parallelization of any use? When I parallelize calls to one method, I actually run up to 8 such calls in parallel, then wait for available threads and so on. I have doubt that nested parallelization can improve execution time. If I am running three threads for three main method calls, and then again 200 parallelized calls in each main method call, I am using 3 threads, that are then calling the 5 remaining threads upon avaibility. In theory, this won't help go faster. What do you think of it?

5/ PARALLELIZATION for random points generation uses QtConcurrent::run() on a static member function of class Random. Also, I want to 'reseed' random number generator at each thread creation. Thus, I create an instance of Random class at each thread creation, and call a static method newSeed() on this instance. I don't know whether I am working properly: methods are static, and I am afraid I am changing the seed not only for the Random instance I created in my new thread, but for all Random instances in running threads.

I thank you for help on some or all of these items. Please warn if code/more info needed. I am stuck there.

Regards.

d_stranz
8th April 2012, 20:55
I can't answer your specific questions, but usually when I am trying something new, I write a prototype program that simply tests the basic concept I am trying to implement. In your case, this is QtConcurrent. So, write a prototype that does something simple, many times. (Like fill up a 2D array of 1 million entries) Then parallelize this to create N concurrent threads (like one for each row), have them all do the same thing in parallel, and see if you get any speedup. If that works, then substitute a simplified version of your Algo() (with no nested parallel execution). Keep building it up until you understand what isn't working. It could be that what you expect QtConcurrent to do and what it actually does are different.

Don't forget locking and mutexes. It could also be that if you are accessing the same memory in each thread, then QtConcurrent only lets one thread in at a time. In this case, you have an essentially serial process, no matter how many threads, and it will run more slowly because of synchronization.

Just guessing here; that's why I write prototypes - so I can stop guessing and start understanding. :-)

dlib
8th April 2012, 21:09
Thanks d_stranz.

I implemented stepwise the things I am describing (in real life and not in prototypes admittedly). First I implemented non-nested parallelization: it is working perfect, reducing computation time by 1/3 or so. But, when I withdraw the QTime object I had put around code, it is not working any more--> I understand that QTime has some impact on threads, but I don't know what exactly thus asking here. Then, I implemented the two nested parallelization I am describing, and I am very surprised that I do not have additionnal gain of time --> I understand that nested parallelization may be a wrong idea, that's what I am sharing in this forum too.

So yes, I tried, yes I do localize the matters, but I have no answers for now, that's why I use the QtForum for help.

Regarding concurrent memory access: I duplicated the instances I am working on, but I don't know if I operate properly on the class with static member functions. If static function means that a kind of singleton pattern is implemented for this class, then indeed I may have the concurrent access problem you mention. If not, then since I am creating a new instance of this class for each thread, I think that no concurrent access should occur. Any further ideas regarding the questions I raised ?

Thanks

d_stranz
8th April 2012, 22:02
I don't see anything in the QtConcurrent docs that says the user of QtConcurrent must be QObject-derived.

In Mark Summerfield's book, Advanced Qt Programming, there is a chapter on QtConcurrent. One quote: "...the setup costs of creating so many threads (especially on Windows) are likely to be out of all proportion to the potential savings of spreading the work over secondary threads".

Perhaps with your nested threads, you have placed your program into this situation?

I am not sure about the relationship of QTime to the concurrency. QTime might need to access the hardware clock, and thereby cause an interrupt in the calling thread. This means that another thread might be able to run during the time the first thread is interrupted waiting for the clock. Without the QTime calls, you might be in the mode described by Summerfield.

Edit: Thinking about that quote, and what you describe: if the cost of your function evaluation is low compared to the cost of creating a new thread to run the function, then you might be in the situation Summerfield describes. Perhaps you are trying to parallelize your algorithm too deeply, and should try to make it parallel at a less granular level. I imagine the optimization is probably more compute-intensive than the function eval, so try keeping the granularity to just that level. Or try to refactor the parallelism at a different level, or reusing threads instead of constantly creating and destroying them.

Might be that QtConcurrent itself introduces too much overhead, and you need to implement at a lower level. Painful.

dlib
9th April 2012, 18:31
Any other input regarding presence of QTime object? Thanks.