PDA

View Full Version : Tuning QtConcurrent::map()



Cruz
9th July 2021, 09:46
Hi there,

I am experimenting with the QtConcurrent framework in order to launch multiple, long running threads that compute some scientific stuff.
I use a blocking QtConcurrent::blockingMap call with a member function like so:



void Experimenter::runExperiments()
{
QtConcurrent::blockingMap(cases, [this] (ExperimentConfig& ex) { processCase(ex); });
}

void Experimenter::processCase(ExperimentConfig& ex)
{
// compute many things
}


This works fine as it launches some 20 threads on my machine, but only one of these threads is actively working. I'm observing the threads in htop and I can see that only one core and one process is peaked to near 100%. The machine has plenty of memory available and barely 10% of it is used. Any ideas as to how I can make this more efficient?

d_stranz
9th July 2021, 16:42
Any ideas as to how I can make this more efficient?

If I am decoding the documentation correctly, it seems that QtConcurrent::blockingMap() does create multiple threads, but they run sequentially and block until the entire sequence has been processed. You should probably try using QtConcurrent::map() and QFutureWatcher instead to run the calculations in parallel and be notified when all have finished.

Is your "sequence" (cases) access-protected by a mutex or semaphore? If my understanding of the documentation is incorrect, then locking the sequence when a thread is running could be the cause, since it would be blocking other threads from accessing their elements of the sequence.

Cruz
11th July 2021, 16:50
I have tried three different approaches to this multithreading problem and I am observing the same result no matter which way I go.
The first version is the blocking QtConcurrent::blockingMap() call:



class Experimenter : public QThread
{
QVector<ExperimentConfig> cases;
}

void Experimenter::runExperiments()
{
QtConcurrent::blockingMap(cases, [this] (ExperimentConfig& ex) { processCase(ex); });
}

void Experimenter::processCase(ExperimentConfig& ex)
{
// compute many things
}


In the second version, I use QtConcurrent::run() to launch threads and watch QFuture objects to see when a thread has finished:


void Experimenter::runExperiments()
{
const static uint threads = 5;
Vector<QFuture<void> > results(threads);
bool finished = false;
uint k = 0;
while (!finished)
{
for (uint i = 0; i < threads; i++)
{
if (results[i].isFinished() && k < cases.size())
{
qDebug() << "Thread" << i << "is now working on case" << k;
results[i] = QtConcurrent::run(this, &Experimenter::processCase, cases[k++]);
}
}

if (k >= cases.size())
finished = true;
sleep(10);
}
}


In the third version, I start() my own QThread objects and watch their isRunning() status to see when they have finished:



class WorkerThread : public QThread
{
public:

ExperimentConfig ex;
void run()
{
processCase(ex);
}

void processCase(ExperimentConfig& ex);
};

void Experimenter::runExperiments()
{
const static uint threads = 5;
WorkerThread thread[threads];
bool finished = false;
uint k = 0;
while (!finished)
{
for (uint i = 0; i < threads; i++)
{
if (!thread[i].isRunning() && k < cases.size())
{
qDebug() << "Thread" << i << "is now working on case" << k;
cases[thread[i].ex.id] = thread[i].ex;
thread[i].ex = cases[k++];
thread[i].start();
}
}

if (k >= cases.size())
finished = true;
sleep(10);
}
}


The observed behavior in all cases is that the threads are interleaved correctly, they do run at the same time, but apparently they are all run on the same core. In htop, I can see only one core peaking at 100% and the threads I created are fluctuating between 0% an 100%, mostly one at a time, sometimes two threads adding up to 100%. It seems to me that all started threads are processed on the same core and are thus interrupted a lot by round robin. When I start a second process that again launches its own threads, I can see two cores being loaded to 100%. Why aren't the threads running on different cores?

The question whether cases are mutexed is good thinking, but unfortunately no, this is not the case. Or at least not that I am aware of.
cases is a QVector of objects as shown in the first code snippet above. Parallel access through the [] operator should be no problem, right?
Otherwise, the threads give qDebug output and perform unmutexed file write operations which hasn't been an issue so far.

d_stranz
11th July 2021, 21:23
Why aren't the threads running on different cores?

No idea. The docs say that Qt Concurrent is designed to take advantage of multiple cores and to scale as the number of cores increases.

I wonder if there is an OS setting you need to change to allow a process to spawn threads across multiple cores?

Cruz
12th July 2021, 02:05
I replaced my complex processCase() function with a few simple lines that stress the CPU and it turns out the threading through QtConcurrent works as expected. The started threads execute at 100% on different cores. So it turns out the error was indeed a mutex somewhere deep down that all threads tried to lock and so they could run only one at a time. This is embarassing, but thanks for the help!

d_stranz
12th July 2021, 16:02
That's good to know. So I guess the difference between the blocking and non-blocking QtConcurrent map methods is that the method containing the call to QtConcurrent::blockingMap() will not return until all concurrent processing is complete, whereas QtConcurrent::map() launches the threads and returns immediately.

From a design perspective, is there a reason why you chose to use the blocking call?

Cruz
12th July 2021, 18:35
From a design perspective, is there a reason why you chose to use the blocking call?


Because with blockingMap() it's simpler to do things after all threads have finished. For example, I save the computed data after all threads have run. For everything else, you have to watch QFuture objects in a loop or set up additional signal and slot connections to know when all threads are finished.