PDA

View Full Version : Debugging a multi threaded windows app



Cruz
10th April 2012, 00:40
Hello there!

I'm facing a multi threaded windows software that appears to crash randomly. The project is in Eclipse. The general structure is that there is a GUI and a worker thread that generates all kinds of data, which are displayed on the gui. No signals and slots were used. The data structures are global, the worker thread writes into them and the gui thread reads from them. At some point the software crashes always with an illegal use of an [] operator on a QList.

ASSERT failure in QList<T>::operator[]: "index out of range" ...

There are lots of QLists and even more [] operators being used.

Can anyone please give me advice how to debug this? I'm hoping for more than just "use a debugger", because I wouldn't even know where to place a break point. What I need is something like a post mortem stack trace to see where exactly the [] operator failed.

Thanks
Cruz

ChrisW67
10th April 2012, 05:14
ASSERT failure in QList<T>:operator[]: "index out of range" ...

There are lots of QLists and even more [] operators being used.

... and no range checks coded into your application ;)

What I need is something like a post mortem stack trace to see where exactly the [] operator failed.
Then ask your debugger (yes, the answer is use a debugger) for a backtrace after the program terminates abnormally. Generally, if you run your debugger from inside an IDE this will happen automatically. If you are using the GNU compilers and debugger then "bt" (http://sourceware.org/gdb/current/onlinedocs/gdb/Backtrace.html#Backtrace) is the magic command ("thread apply" may also be needed). If you are using the Microsoft compiler and cdb then "k" (http://msdn.microsoft.com/en-us/library/windows/hardware/ff551943%28v=vs.85%29.aspx) seems to be the magic command.

Cruz
10th April 2012, 07:59
Alright, thanks, I guess I gonna have to brush up my debugging skills a bit.

I have noticed in other unsafe multithreaded applications before, that QList seems to increment the size() first, before the last index is actually available.

Berryblue031
10th April 2012, 10:49
QList is not threadsafe that fact that your application hasn't been crashing until now is probably a miracle :P

The fact that your application is based on a bunch of global variables is really gross (I would expect that kind of solution from a highschool student not a professional). You should consider a major refactor.

But back to your problem - that error means you are accessing an index that doesn't exist probably due to the different threads modifying the QList at the same time.

Working with your existing system I would do two things:


Since the datastructures (like QList are not threadsafe) you need to add a QMutex for each of your global data structures, lock the mutex everytime you need to access the datastructure and unlock when you are finished, the best way to do this would be wrap your existing data structures with a class and provide pass through methods for accessing the actual data (this way you can never just forget the mutex)
Always check array index ranges


for example:

class ThreadSafeList
{
void append(int i);
int value(int index);

QList<int> mList;
QMutex mMutex;
}

void ThreadSafeList::append(int i)
{
mMutex.lock();
mList.append(i);
mMutex.unlock();
}

int ThreadSafeList::value(int index)
{
int value = 0;
mMutex.lock();
if(index < mList.count())
{
value = mList.value(index);
}
else
{
qDebug() << "ACCESSING INVALID INDEX";
}
mMutex.unlock();
return value;
}



**disclaimer this code is only pseudo written off the top of my head for explanation purposes, depending on what you actually have modify it accordingly

ChrisW67
10th April 2012, 11:12
As Berryblue031 points out, if you have two threads accessing the same data structure without any form of locking/access control then you are asking for the application to crash often and unpredictably. This is not the fault of QList. There is a bunch of information in the Qt manuals about thread safety including mechanisms to handle shared data access. .

Cruz
10th April 2012, 12:46
Thanks to both of you for your hints and remarks. Trust me, I am fully aware of the problem of thread safety here (or the lack of). The unsafe architecture that I described in my OP is the result of a special requirement. This software is running on a robot. In the first place it has to be fast fast fast and it has to loop with a steady pace of 100Hz. Waiting for mutexes to release or overhead for copying data is something that needs to be avoided at all costs in this special case. That said, having a global and central data structure where one thread writes and another thread reads is not a problem per se, because reading and writing aligned 64 bit data types appears to be atomic on x86 systems. And at the bottom line all sensory input that is being written breaks down to a bunch of floats or doubles. So the worst thing that can happen is that the reading thread is reading a data set that is not entirely in synch, say it has only been half written by the writer thread, which is no problem at all.

I do realize that using QList in a thread unsafe manner was not a smart choice and leads to the problems I am facing now. However, this problem can be solved, once I can identify the exact place where the problem occurs.