PDA

View Full Version : Partial results with mappedReduced



0x4e84
10th March 2010, 11:09
Hi,

I just started with qtConcurrent and I am already fond of it. Great job ;-)

However, there is one thing I could not achieve yet, and maybe someone can help me out from here:

if I use the mappedReduced, how can I get access to partial results, before the whole list of inputs is processed? When I tried to access e.g. resultAt(0), it sometimes works, but most of the time, the program is just crashing.

Typically, what I do is some number crunching in the mapFunction, and I make a picture in the reduceFunction from the results. Both mapFunction and reduceFunction are static.

QFuture<T> QtConcurrent::mappedReduced (const Sequence & sequence, MapFunction mapFunction, ReduceFunction reduceFunction)

What I would like is show the image from time to time, so that we can see the progress "Live" while the sequence is being processed...

Thanks in advance for any help!!

0x4e84

0x4e84
10th March 2010, 17:58
Hi again,

Maybe some code might help grasping the problem better. I'll try to reproduce the concerned code...

From the header:



class MyClass : public QObject, public QGraphicsPixmapItem
{
Q_OBJECT
...
private:
static SubResult doCalculate (const SubDefinition &sub);
static void doPostProcess (QVector<int> &result, const SubResult &sub);
QFuture< QVector<int> > future;
QVector<SubDefinition> vectorOfSubs;
QVector<int> partialMap;
...
};



And from source:



...

void MyClass::start()
{
...
future = QtConcurrent::mappedReduced(vectorOfSubs, doCalculate, doPostProcess);
watcher.setFuture(future);
...
}


SubResult MyClass::doCalculate(const SubDefinition &sub)
{
...
}

void MyClass::doPostProcess(QVector<int> &result, const SubResult &sub)
{
...
}

void MyClass::someFunctionCalledPeriodicallyWhenProcess Started()
{
...
partialMap = future.resultAt(0);
...
}


The program freezes at the "future.resultAt(0)" line when I try to call my "someFunctionCalledPeriodicallyWhenProcessStarted" before all the items in my vectors are processed...

Interestingly also, when I tried to connect a SLOT on the watcher.resultReadyAt() SIGNAL, I did not get any signal occuring at all. I can only get watcher's "finished()" and "progressValueChanged()" signals...

Any clue, anyone?


Best regards,

0x4e84

0x4e84
11th March 2010, 12:26
Nevermind mappedReduced...

I could not find a decent solution for a "live preview" using the 'mappedReduced' way, so I gave it a try with 'mapped' instead.

=> Well, I must say... it works more than good now! Smooth and quick!


So, problem solved, even if the initial question remains unanswered... It might not be possible at all, maybe.


Thanks and kind regards for those who took the time to read my posts,


0x4e84

0x4e84
12th March 2010, 14:38
Hi again,

I continue my monologue with a new question. I would enjoy someone to jump in the 'discussion'... It's getting frustrating being read but not answered :confused:...

Anyway, here is my point:
I can't use 'mappedREduced' because I did not find out how to have access to intermediate results, before the whole is finished, and I found a way to do that with 'mapped' instead, which is good!...
... but there is one feature that I really miss from the 'mappedReduced': its nice memory management! As soon as a result is delivered by the 'mapped' function and goes through the 'reduce' function, its memory is released, which keeps the total used memory at any moment very low.

With 'mapped', the memory of all results is simply accumulated (I easily end up with 500MB usage or more, while I could stay below 50MB with 'mappedReduced') and I have to delete the future's result the hard way at the end...

I call a postprocessing function when a resultReadyAt() signal is emitted, using the 'resultAt()' data, and I'd like this memory to be released afterwards, since I don't need it any more.
My result is a struct containing a QVector, and this latter contains all the heavy data. But calling 'clear()' or 'resize(0)' on that vector does not help. The data is still there (I tested!! I can read it out even after the 'clear()').

The code looks like this:



void CMyClass::resultReadyAt(int pos)
{
doPostProcess(future->resultAt(pos));
future->resultAt(pos).iVector.clear(); // does not help !!
future->resultAt(pos).iVector.resize(0); // does not help either !!
qDebug() << "Items left in the vector" << pos << ":" << future->resultAt(pos).iVector.count(); // Still indicates the same amount of data!!
//delete (&future->resultAt(pos).iVector); // does not work: causes the program to crash !!
}



If you know about a solution, or even just have a suggestion, feel free to let me know, I would be very grateful!

Thanks in advance!!


0x4e84

wysota
12th March 2010, 16:31
if I use the mappedReduced, how can I get access to partial results, before the whole list of inputs is processed? When I tried to access e.g. resultAt(0), it sometimes works, but most of the time, the program is just crashing.

I don't think you can do that in a general case. If you want to try, install a QFutureWatcher on the future returned by the mappedReduced() call. If the future returned represents a list (which I doubt - that's why I say I don't think it is possible to obtain what you want), you would then be signalled when particular results are ready.


I can't use 'mappedREduced' because I did not find out how to have access to intermediate results, before the whole is finished, and I found a way to do that with 'mapped' instead, which is good!...
... but there is one feature that I really miss from the 'mappedReduced': its nice memory management! As soon as a result is delivered by the 'mapped' function and goes through the 'reduce' function, its memory is released, which keeps the total used memory at any moment very low.

With 'mapped', the memory of all results is simply accumulated (I easily end up with 500MB usage or more, while I could stay below 50MB with 'mappedReduced') and I have to delete the future's result the hard way at the end...

I call a postprocessing function when a resultReadyAt() signal is emitted, using the 'resultAt()' data, and I'd like this memory to be released afterwards, since I don't need it any more.
My result is a struct containing a QVector, and this latter contains all the heavy data. But calling 'clear()' or 'resize(0)' on that vector does not help. The data is still there (I tested!! I can read it out even after the 'clear()').

The code looks like this:



void CMyClass::resultReadyAt(int pos)
{
doPostProcess(future->resultAt(pos));
future->resultAt(pos).iVector.clear(); // does not help !!
future->resultAt(pos).iVector.resize(0); // does not help either !!
qDebug() << "Items left in the vector" << pos << ":" << future->resultAt(pos).iVector.count(); // Still indicates the same amount of data!!
//delete (&future->resultAt(pos).iVector); // does not work: causes the program to crash !!
}



If you know about a solution, or even just have a suggestion, feel free to let me know, I would be very grateful!

I can think of some things that could be used. For example you could process the data in chunks - divide the whole domain into subdomains, process one subdomain using Qt Concurrent, then partially reduce the result, free the memory and proceed with the next subdomain. Instead of a flat line of parallel execution you'd get a kind of a tree of executions.

Another solution I can think of is to leave mapped() and use QRunnable objects to process each chunk of data. When a particular runnable is finished, you can process its result in the main thread and get rid of the runnable right away.

0x4e84
13th March 2010, 06:40
Hi wysota,

Thanks for joining in and for helping me finding some solution to my issue ;-)


I don't think you can do that in a general case. If you want to try, install a QFutureWatcher on the future returned by the mappedReduced() call. If the future returned represents a list (which I doubt - that's why I say I don't think it is possible to obtain what you want), you would then be signalled when particular results are ready.

Actually, I do use a GFutureWatcher. The result of my 'mappedReduced()' function was a large vector, but I did not manage to get access to it before the whole thing was processed. Besides, The 'resultReadyAt()' signal was never triggered... except at the end. I think this nice method is just not expected to be used as I wanted to use it.



I can think of some things that could be used. For example you could process the data in chunks - divide the whole domain into subdomains, process one subdomain using Qt Concurrent, then partially reduce the result, free the memory and proceed with the next subdomain. Instead of a flat line of parallel execution you'd get a kind of a tree of executions.

I see your point. But wouldn't that add a lot of overhead, to start new QtConcurrent instances over and over? I thought this is a quite "expensive" action to do, and I might then lose the benefit of multithreading with QtConcurrent, or ?...



Another solution I can think of is to leave mapped() and use QRunnable objects to process each chunk of data. When a particular runnable is finished, you can process its result in the main thread and get rid of the runnable right away.

I have thought about this way too, using a ThreadPool, and at the end, this might be the most flexible and powerful way way after all, but I did not try it out yet.
'mapped' and 'mappedReduced' are so convenient to use, because they handle automatically and efficiently the available CPU cores.
I would like to investigate a bit further the 'mapped' alternative, and if can find an elegant way to solve the memory usage problem.
If I don't succeed, I'll probably redesign the whole stuff around Thread Pools...


Thsnk again for the suggestions.

I'll feed this thread with my further outcomes, since it might interest other folks too, and there are not so many info available yet on the forums in general about QtConcurrent usage.


Kind regards,


0x4e84

wysota
13th March 2010, 08:02
I see your point. But wouldn't that add a lot of overhead, to start new QtConcurrent instances over and over? I thought this is a quite "expensive" action to do, and I might then lose the benefit of multithreading with QtConcurrent, or ?...
I don't think this is much more expensive than having a single large vector of data to process using a single concurrent call. It's only a matter of proper synchronization. The only thing you lose is that if you have multiple execution units (approaching to the 'manycore' barrier) they won't be used effectively when reaching the end of a chunk as you will have to wait until one chunk ends before starting the next.


I have thought about this way too, using a ThreadPool, and at the end, this might be the most flexible and powerful way way after all, but I did not try it out yet.
'mapped' and 'mappedReduced' are so convenient to use, because they handle automatically and efficiently the available CPU cores.
Runnables do that as well - when concurrent runs out of threads it will queue jobs until threads are available, just like mapped() does.


If I don't succeed, I'll probably redesign the whole stuff around Thread Pools...
mapped() uses the thread pool as well.


I'll feed this thread with my further outcomes, since it might interest other folks too, and there are not so many info available yet on the forums in general about QtConcurrent usage.
There is some info about using concurrent or similar approaches in the article here: http://doc.trolltech.com/qq/qq27-responsive-guis.html

0x4e84
13th March 2010, 20:11
Runnables do that as well - when concurrent runs out of threads it will queue jobs until threads are available, just like mapped() does.


You convinced me with that sentence! I will try it.


There is some info about using concurrent or similar approaches in the article here: http://doc.trolltech.com/qq/qq27-responsive-guis.html

I knew this one, but I'll read it again, more thoroughly...

I'll be back... with some news about how it turned out.

Thanks for your support, wysota!


Kind regards,

0x4e84

0x4e84
16th March 2010, 10:29
Hi again!

I re-implemented my calculation methods using the QRunnable and everything turned out smoothly: Same performance as with "mapped", the threads are spread all over the available cores as before, but now there is no more memory usage waste, as I can get rid of the data I have no more use of, by just deleting the runnables after post-processing them!
I like it when a plan comes together! ;-)

Thanks again wysota for your advices.

The program I'm working on is a fractal calculator (yet another one!). The idea is to use this as a pretext to work on calculation optimization and multithreading usage with Qt. Besides, there are a few user interface stuff I am also trying out...
If there is some interest, I'll post a link here to the SW as it will be available (open source). Just let me know.


Kind regards,

0x4e84