PDA

View Full Version : Loading data in a thread



Cruz
28th October 2010, 12:08
Hello there!

I'm using a QThread to load approximately 100 MB of motion capture data. The reason why I use a thread is that it takes some time to load and process the data and when the program is started, I want to be able to watch the data in a 3D environment right away from the start and not wait until all the data is loaded. I have observed a problem several times and I believe it must be a thread related issue. Basically, some frames of the data, which is loaded into a QList, seem to be messed up sometimes. But here is a more detailed explanation.

So, the loader thread reads the data from a file line by line, writes each line in a struct and appends the struct to a QList like this:



QList<DataStruct> data;

struct.handX = line[0];
struct.handY = line[1];
...

data << struct;


The 3D visualization object accesses directly the data QList of the file loader thread / object even during the time the data is loaded. When displaying the frames on the screen, I check if data.size() is larger than the frame I want to display and if this is true, I assume the frame is readily loaded.



if (frameIndex + 1 < loader.data.size() - 1)
{
frameIndex++;
displayFrame(loader.data[frameIndex]);
}


This works great aside from the following effect. Sometimes I would encounter a frame which is severely out of place, as if the order of the frames in the QList wasn't right. I can rewind and look at the same spot again and even after the loader thread has finished loading, the same spot stays screwed up during the entire runtime of the program. I checked if there is an error in the data file at this spot and it looks clean. Also if I quit and reload the program, there is no problem with the same frame anymore, but the problem might occur at a different frame that was ok before. So I think it must be a thread related issue, but I don't understand why.

First of all, I check the size() of the QList, so I never access a QList index that hasn't been already loaded with a frame. And second, I can understand that it would be an issue if I try to read a frame at the same time when it's just being written by the loader thread, but when I rewind and look at the same QList index later, then it should be clean because the writing has already been completed. And it is of course impossible to reliably reproduce, so I don't even know how to approach this problem. Any ideas?

Thanks
Cruz

high_flyer
28th October 2010, 12:41
The 3D visualization object accesses directly the data QList of the file loader thread / object even during the time the data is loaded.
If this point in code is not mutexted, that is your problem.

Cruz
28th October 2010, 13:06
I don't see what a mutex would do in this case. Only the thread writes to the QList. The visualizer only reads it and never changes it. So the only problem can be if the visualizer tries to access excactly the same index in the QList that is written by the loader. After the loader has completed filling the QList with frames, and doesn't touch it anymore, how can it be that there are still some frames screwed up and what would a mutex do about it?

wysota
28th October 2010, 13:17
I don't see what a mutex would do in this case. Only the thread writes to the QList. The visualizer only reads it and never changes it. So the only problem can be if the visualizer tries to access excactly the same index in the QList that is written by the loader. After the loader has completed filling the QList with frames, and doesn't touch it anymore, how can it be that there are still some frames screwed up and what would a mutex do about it?

Operations on items are not atomic. One thread can partially write the data and the other thread might already try to access this bogus data. You could get a nice effect in a situation when there is a pointer in an object that should be pointing to some data but before the pointer gets initialized another thread reads it and tries to dereference it.

You may use a QReadWriteLock instead of a plain mutex but if you only have two threads then both these mechanisms will be equivalent.

high_flyer
28th October 2010, 13:21
So the only problem can be if the visualizer tries to access exactly the same index in the QList that is written by the loader.
Right.
And I am not sure if this behavior is defined.
So if during the writing of the list, it gets disrupted because of the access on it from your gui, resulting in messedup content of the list element, your messedup element stays there since you are only filling it ones.
It vary bad practice not to mutex shared data between threads, even if you think that your case does not need it.
I guess the easiest thing would be to try it, its just few lines of code.

Having said that, you didn't show your code, so it is very possible you are doing something wrong somewhere completely else.

Cruz
28th October 2010, 13:37
I don't believe the gui access is messing things up theory. Even if we put the loader.data.size() check aside and assume that yes, the gui does read an element in the QList that hasn't been fully written yet. So what? The gui gets corrupted data and displays a frame that doesn't look right. But it does not influence the writing of the loader thread. The loader thread will not skip this frame just because there was a read acces to the same memory location, will it?

Sure, I can try the mutex and maybe it will even fix it, but that's not enough. :) I would like to truely understand what goes wrong there and why.


Having said that, you didn't show your code, so it is very possible you are doing something wrong somewhere completely else.

Of course that's always a possibility. But I did test my program thoroughly and all parts seem to work. The fact that the problem occurs rarely and at different places in the data is a strong hint of thread issues. Besides, the whole code is too much to show and I'm not asking anyone to spend time debugging my program.

high_flyer
28th October 2010, 13:45
But it does not influence the writing of the loader thread.
Show me where this is defined. :)

wysota
28th October 2010, 14:06
Even if we put the loader.data.size() check aside
Any such check is useless as conditions may change between the time you make the check and the time you access the data.

The gui gets corrupted data and displays a frame that doesn't look right. But it does not influence the writing of the loader thread.
You are assuming none of the methods in the gui modify the data structure in any way. If you have an implicitly share class such as QList, you may easily fall out of sync. These classes are not thread-safe and I don't know what we are arguing about. The sole statement should be enough to make you synchronize access to the non-thread-safe object.

Cruz
28th October 2010, 14:18
I don't know! Show me why it's not like that?

Even if threads are interweaved, somewhere deep in the core it will come down to a bunch of instructions being executed one by one. Some of them read a register, some of them write a register. I can understand that if say the writing of a double is stopped in the middle, half of the bytes are right and half of the bytes are wrong. If another few instructions now try to read this double, they will not the number they expect. But when the writing of it continues, eventually the double will be written correctly. Or at what point does a read instruction influence a write instruction that is supposed to be executed at a later time?

Added after 7 minutes:


Any such check is useless as conditions may change between the time you make the check and the time you access the data.

Yes, the data might grow even further, nullifying the chance of reading the same element that is being written.




You are assuming none of the methods in the gui modify the data structure in any way. If you have an implicitly share class such as QList, you may easily fall out of sync.

That's correct. The data structure is not returned by any method, there are no signals and slots transporting the QList, it's just a plain old access to loader.data[index]. I don't know about the inner workings of the implicitely shared structures. Does a read access like that modify them ever?



These classes are not thread-safe and I don't know what we are arguing about. The sole statement should be enough to make you synchronize access to the non-thread-safe object.

I wasn't aware of arguing, so far I'm only asking question as to why does it happen and I expect more of an answer than "because it's not thread safe". Is it possible, that reading the same element in a QList, that is being written by another thread, screws up the element for good?

wysota
28th October 2010, 14:18
If you have a class that internally holds a pointer to some structure which gets copied, it may happen that one thread accesses one copy of the data and the other accesses the other copy. Now depending on which of the copies becomes the "real" copy after the split, the changes made to one or the other will be lost (of course including a situation when only one of the threads modifies the data). The split operation (but not copying the underlying data) itself is atomic (Qt guarantees that) but not the complex operation that involves the split.

Cruz
28th October 2010, 14:54
Ok I think I understand. Is it because the [] operator is overloaded and might cause a copy of the whole data, because one thread is reading it and another one is writing it? And then what happens is what Wysota said?

If this is true, the problem would vanish if I used a plain old array of structs instead of the QList, right?

wysota
28th October 2010, 14:59
If this is true, the problem would vanish if I used a plain old array of structs instead of the QList, right?
Not necessarily. Adding new items to the list may not be atomic so one of your threads (the reading one) may crash your program.

Cruz
28th October 2010, 15:06
I was thinking of a fixed size array of structs, not a list in this case. And the structs don't contain any pointers. So the worst thing that can happen is that the reading thread gets bogus data when trying to access elemets, that have not been written yet.

wysota
28th October 2010, 15:13
First you should ask yourself a question - what do you want to achieve exactly. I'm not sure why you are writing to the structure and want to read it at the same time. I'm assuming that you are probably loading data from some file and want to show the user a "live" representation of the structure as it gets filled. I don't understand though why are you so reluctant to use a mutex, it seems a perfect solution here. If you add a bit of smart logic around it, the final effect should be really good.

Cruz
28th October 2010, 16:14
I'm assuming that you are probably loading data from some file and want to show the user a "live" representation of the structure as it gets filled.

Precisely.



I don't understand though why are you so reluctant to use a mutex, it seems a perfect solution here. If you add a bit of smart logic around it, the final effect should be really good.

I was a little bit reluctant, because people say it slows things down. But mostly I wanted an explanation why it makes sense, because I didn't know that reading from the QList can trigger copying. The mutex is in place now and there is no noticable change in the performance. But now it bugs me that in the background the data structure gets copied thousands of times even though it's totally unnecessary.

wysota
28th October 2010, 16:29
I was a little bit reluctant, because people say it slows things down.
Only when threads try to enter the critical section at the same time but even if they do, you wouldn't even notice the slowdown. You get more slowdown by refreshing the gui thread too often.


But mostly I wanted an explanation why it makes sense, because I didn't know that reading from the QList can trigger copying.
I didn't say that.


The mutex is in place now and there is no noticable change in the performance. But now it bugs me that in the background the data structure gets copied thousands of times even though it's totally unnecessary.
So don't copy it.

Cruz
28th October 2010, 16:31
I didn't say that.

So then why does my data get corrupted in the first place?

wysota
28th October 2010, 16:39
I haven't seen your code, how could I answer your question?

Cruz
28th October 2010, 17:14
Ok, so the whole thread was pointless so far?

Can you please answer just this one specific question, that I already asked several times. If one thread is only reading from a QList and another thread is only writing, can it happen that at least one element of the QList gets corrupted for good? And again I emphasize that reading a half written element is NOT the point. There are errors in the data that remain even after the writing thread has finished its job.

wysota
28th October 2010, 17:19
Ok, so the whole thread was pointless so far?
I can't tell you what is wrong with your code because I didn't see it. I can only tell you what might be wrong with your code based on your description and that's what I did. Since adding a mutex solved the problem it is more likely that I was correct but I can't be sure because I haven't seen your code.


If one thread is only reading from a QList and another thread is only writing, can it happen that at least one element of the QList gets corrupted for good?
Yes, I believe so since QList is not declared thread-safe for a reason.

As I said at the very beginning, I don't see the point of all this discussion, there is a simple rule of thumb that accessing non-thread-safe data from more than one thread requires proper synchronisation.

Cruz
28th October 2010, 17:33
As I said at the very beginning, I don't see the point of all this discussion, there is a simple rule of thumb that accessing non-thread-safe data from more than one thread requires proper synchronisation.

And my reply to that is still the same. I want to understand why the problem occurs and not just stupidly follow a rule of thumb. And since the gui thread is only reading, I don't see how the data in the QList can be corrupted. If accessing an implicitely shared container by the [] operator can cause a copy of the data, then that would be a valid reason why it breaks. But it would be nice to have stronger confirmation than "it's not declared thread-safe". It's already not thread-safe if the reading is not guaranteed to work. But the reading is not the problem, the writing is. I will invest some time now and try to extract the relevant portion of the code, so that you can see it.

wysota
28th October 2010, 17:38
And my reply to that is still the same. I want to understand why the problem occurs and not just stupidly follow a rule of thumb.
Be my guest, dive in: http://qt.gitorious.org/qt/qt/trees/4.7/src


I don't see how the data in the QList can be corrupted.
The fact is it gets corrupted. So either you are wrong that the data can't be corrupted when one thread is reading it and the other is writing it or you are wrong that one thread is reading it and the other is writing it. I still have not seen your code, so I can only guess that somehow somewhere (maybe unwillingly) you are making a copy of the object which directs you to the path I was describing.


If accessing an implicitely shared container by the [] operator can cause a copy of the data, then that would be a valid reason why it breaks. But it would be nice to have stronger confirmation than "it's not declared thread-safe".
Ok, let's make my hint less subtle: Show us your code!

Cruz
28th October 2010, 17:59
So here is the code extract as it was before the mutex.




// One frame of data.
struct Frame
{
double x;
double y;
double z;
}

class FileLoader
{
// The data structure. A QList of Frame structs.
QList<Frame> data;
}

// This is the loader thread.
void FileLoader::run()
{
Frame f;

QFile file("data.dat");
file.open(QIODevice::ReadOnly);
QDataStream in(&file);

int frameCount = 0;
in >> frameCount;
for (int i = 0; i < frameCount; i++)
{
in >> f.x;
in >> f.y;
in >> f.z;

// Here is where a new frame is appended to the QList.
data << f;

if (i % 1000 == 0)
emit fileLoadingProgress(100*i/frameCount);
}
}


// The GUI class.
class GUI
{
FileLoader loader; // It has the FileLoader.
Frame* currentFrame; // And a pointer to the frame currently displayed on the screen.
double t; // This is a time parameter that is mapped to the data index.
double tscale; // This influences how fast the time grows (so the speed of the animation).
}

GUI::GUI(QWidget *parent)
{
loader.start(); // It starts the FileLoader thread in the constructor.
t = 0.0;
tscale = 1.0;
}

// The animate function is periodically called by a timer.
// It changes the currentFrame pointer by mapping the time t to a data index.
void GUI::animate()
{
if (t + tscale < loader.data.size()-1)
{
t = t + tscale; // t is a double so that it can grow with arbitrary speed.

currentFrame = &(loader.data[(int)t]); // <-- currentFrame pointer assignment.
}

draw();
}




Btw, I did not say that the mutex solved the problem. I just said I put it in and it didn't slow anything down. I need to observe the software for a while to see if the broken frames still occur.

wysota
28th October 2010, 18:18
As far as I can see your animate() method uses the list as an lvalue (or whatever it was called) so it's likely the list is copied there. You can try using the at() method instead but it will return a copy of the item in the list which will make your currentFrame pointer invalid (as you return a pointer to a temporary object).