PDA

View Full Version : Bizzare behavior under X (Qt 4.3.0)



slcotter
25th July 2007, 16:47
This is about Qt 4.3.0 running on Fedora Core 6 system with fully updated libraries (upgraded with the provided "Software Updater" utility).

I've written a somewhat simple qt program to recieve data from another computer via TCP. This action occurs in a worker QThread. The data is passed into a global data class. The GUI reads this data and draws a graph and displays various things. I the details are likely irrelevent to my problem.

After an arbitrary amount of time(anywhere from 15 minutes to half a day), the GUI freezes and begins spitting out the following cluster of error messages (and it will continue to do so until killed):


X Error: BadIDChoice (invalid resource ID chosen for this connection) 14
Major opcode: 55 (X_CreateGC)
Resource id: 0x10000000
X Error: BadGC (invalid GC parameter) 13
Major opcode: 59 (X_SetClipRectangles)
Resource id: 0x10000000
X Error: BadGC (invalid GC parameter) 13
Major opcode: 56 (X_ChangeGC)
Resource id: 0x10000000
X Error: BadGC (invalid GC parameter) 13
Major opcode: 62 (X_CopyArea)
Resource id: 0x10000000
X Error: BadGC (invalid GC parameter) 13
Major opcode: 60 (X_FreeGC)
Resource id: 0x10000000


The program does not crash, so it's proven somewhat difficult to debug. I've run the program under valgrind, and this is the only error message produced:


==8886== Syscall param writev(vector[...]) points to uninitialised byte(s)
==8886== at 0x3F502C7923: writev (in /lib64/libc-2.5.so)
==8886== by 0x3F5164646B: (within /usr/lib64/libX11.so.6.2.0)
==8886== by 0x3F5164B2FE: _XSend (in /usr/lib64/libX11.so.6.2.0)
==8886== by 0x3F5163BE2D: (within /usr/lib64/libX11.so.6.2.0)
==8886== by 0x3F5163BFCA: XPutImage (in /usr/lib64/libX11.so.6.2.0)
==8886== by 0x4E8B25C: QPixmap::fromImage(QImage const&, QFlags<Qt::ImageConversionFlag>) (qpixmap_x11.cpp:1172)
==8886== by 0x4EB9BE3: QPaintEngine::drawImage(QRectF const&, QImage const&, QRectF const&, QFlags<Qt::ImageConversionFlag>) (qpaintengine.cpp:516)
==8886== by 0x4F569C2: QX11PaintEngine::drawImage(QRectF const&, QImage const&, QRectF const&, QFlags<Qt::ImageConversionFlag>) (qpaintengine_x11.cpp:1560
)
==8886== by 0x4EC6725: QPainterPrivate::draw_helper(QPainterPath const&, QPainterPrivate::DrawOperation) (qpainter.cpp:233)
==8886== by 0x4EC8901: QPainter::drawRects(QRect const*, int) (qpainter.cpp:2660)
==8886== by 0x4DFD1E7: QPainter::drawRect(QRect const&) (qpainter.h:557)
==8886== by 0x4EC8B6B: QPainter::fillRect(QRect const&, QBrush const&) (qpainter.cpp:5302)
==8886== by 0x5104734: QCleanlooksStyle::drawControl(QStyle::ControlEleme nt, QStyleOption const*, QPainter*, QWidget const*) const (qcleanlooksstyle.cpp:2
134)
==8886== by 0x51D8232: QMenuBar::paintEvent(QPaintEvent*) (qmenubar.cpp:928)
==8886== by 0x4E098FB: QWidget::event(QEvent*) (qwidget.cpp:6163)
==8886== by 0x51D9B3F: QMenuBar::event(QEvent*) (qmenubar.cpp:1344)
==8886== by 0x4DB0860: QApplicationPrivate::notify_helper(QObject*, QEvent*) (qapplication.cpp:3538)
==8886== by 0x4DB26FF: QApplication::notify(QObject*, QEvent*) (qapplication.cpp:3479)
==8886== by 0x5BE7F8F: QCoreApplication::notifyInternal(QObject*, QEvent*) (qcoreapplication.cpp:509)
==8886== by 0x4DBD85A: QCoreApplication::sendSpontaneousEvent(QObject*, QEvent*) (qcoreapplication.h:189)
==8886== by 0x4E16D46: qt_sendSpontaneousEvent(QObject*, QEvent*) (qapplication_x11.cpp:4367)
==8886== by 0x4F6B2DD: QWidgetPrivate::drawWidget(QPaintDevice*, QRegion const&, QPoint const&, int) (qbackingstore.cpp:1126)
==8886== by 0x4F6BB5F: QWidgetBackingStore::paintSiblingsRecursive(QPaint Device*, QList<QObject*> const&, int, QRegion const&, QPoint const&, int) (qbacki
ngstore.cpp:1031)
==8886== by 0x4F6B6A3: QWidgetPrivate::drawWidget(QPaintDevice*, QRegion const&, QPoint const&, int) (qbackingstore.cpp:1162)
==8886== by 0x4F6BFEB: QWidgetBackingStore::cleanRegion(QRegion const&, QWidget*, bool) (qbackingstore.cpp:934)
==8886== by 0x4F6C816: qt_syncBackingStore(QWidget*) (qbackingstore.cpp:312)
==8886== by 0x4E09F01: QWidget::event(QEvent*) (qwidget.cpp:6305)
==8886== by 0x4DB0860: QApplicationPrivate::notify_helper(QObject*, QEvent*) (qapplication.cpp:3538)
==8886== by 0x4DB26FF: QApplication::notify(QObject*, QEvent*) (qapplication.cpp:3479)
==8886== by 0x5BE7F8F: QCoreApplication::notifyInternal(QObject*, QEvent*) (qcoreapplication.cpp:509)
==8886== by 0x4DAD944: QCoreApplication::sendEvent(QObject*, QEvent*) (qcoreapplication.h:186)
==8886== by 0x5BE8542: QCoreApplicationPrivate::sendPostedEvents(QObject* , int, QThreadData*) (qcoreapplication.cpp:1085)
==8886== by 0x5BE87C1: QCoreApplication::sendPostedEvents(QObject*, int) (qcoreapplication.cpp:970)
==8886== by 0x5C151F3: postEventSourceDispatch(_GSource*, int (*)(void*), void*) (qeventdispatcher_glib.cpp:194)
==8886== by 0x3F5122CF63: g_main_context_dispatch (in /lib64/libglib-2.0.so.0.1200.9)
==8886== by 0x3F5122FD9C: (within /lib64/libglib-2.0.so.0.1200.9)
==8886== by 0x3F512302CD: g_main_context_iteration (in /lib64/libglib-2.0.so.0.1200.9)
==8886== by 0x5C146F9: QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (qeventdispatcher_glib.cpp:325)
==8886== by 0x4E52F3E: QGuiEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (qguieventdispatcher_glib.cpp:178)
==8886== by 0x5BE4B44: QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (qeventloop.cpp:126)
==8886== Address 0x98CB015 is 125 bytes inside a block of size 16,384 alloc'd
==8886== at 0x4A04BA2: calloc (vg_replace_malloc.c:279)
==8886== by 0x3F51637166: XOpenDisplay (in /usr/lib64/libX11.so.6.2.0)
==8886== by 0x4E27AE2: qt_init(QApplicationPrivate*, int, _XDisplay*, unsigned long, unsigned long) (qapplication_x11.cpp:1519)
==8886== by 0x4DB9840: QApplicationPrivate::construct(_XDisplay*, unsigned long, unsigned long) (qapplication.cpp:696)
==8886== by 0x4DBA983: QApplication::QApplication(int&, char**, QApplication::Type, int) (qapplication.cpp:672)
==8886== by 0x41107A: main (main.cpp:17)


Might this be related to my X error?

There are also a few reported memory leaks in places that don't make much sense to me. They're generally small things that don't grow over time, so I won't include them right now.

I've taken the step of recompiling Qt without Xrender and Xcursor enabled, but all this seemed to do was slow the appearance of the bug. Previously it would happen ~15-60 minutes into execution, now it makes an appearance after many hours.

So, my questions are this:
What is resource ID 0x100000000? That sounds like a base address to me, so I'm referencing address zero within some block? What does this mean?

Are valgrind-listed memory leaks within Qt modules something I should worry about? Might they be related to this problem?

And, lastly, what important information have I failed to provide in this post?

Thanks for your time.

marcel
25th July 2007, 16:58
Are valgrind-listed memory leaks within Qt modules something I should worry about? Might they be related to this problem?

No. Trolltech warned about possible memory leaks complains from some profilers. That is not the cause.



And, lastly, what important information have I failed to provide in this post?
Thanks for your time.
You did not post the code :).

The thing is that the cause for this are most likely memory leaks.
Verify if you allocate something repeatedly on the heap and you forgot to deallocate.

Do you have a lot of pixmaps in your application?
Try to enlarge the QPixmapCache size, but this is nowhere a plausible reason. Just hypothetic.

Regards

slcotter
25th July 2007, 17:33
I don't have *any* pixmaps, to my knowledge. They must be implicit in something I do.

I use a lot of QLabels for displaying text and a QPainter object to draw some lines and some more text. That's it.

And a QMainWindow object to encapsulate the whole deal. I'll bundle up some relevent code and post it in a bit.

slcotter
25th July 2007, 18:23
Here's what I believe is sufficient relevent code.

My display is a MainWindow display, so it creates a menu bar, statusbar, and central widget I've left out the implementations for the menu bar. The status bar is completely unused. The Central widget is split up into two sub-widgets. a "left pane" and a "data window".

The left pane is a collection of QLabels placed in a grid layout. The constructor for the leftpane widget class is included. All functions that deal with this widget's behaviour call setText( ) or setPallette( ) methods only, so they are not included.

The "data window" displays one of five sets of data in a histogram format. The paintEvent function branches to one of 5 different functions depending on the dataType variable. They're all functionally equivalent(only differing on which data sets they draw from and the descriptive text which they draw), so I've only included one as an example.

If any other snippets of code would prove useful, please say so.

marcel
25th July 2007, 18:40
Any potential memory leaks happen in "data", whatever it is...
Nothing wrong in the code you posted. No extraordinary allocations, nothing.

Do you use any threads in your app? How is data created/populated?

Regards

slcotter
25th July 2007, 19:48
I do use threads. One thread, to be precise.

There are 6 top-level objects: QApplication, QMainWindow, QThread, QTcpSocket, "Data", QMutex.

"Data" is basically just a glorified container. It stores an unlimited number of "data packets" which come from the other computer and stores the value of all of the states that I want to keep track of. No memory leaks show up in inside of this code according to valgrind.

Data is populated in the following way:
The readyRead signal from the TcpSocket class is connected to a slot in my worker QThread which reads the buffer and interprets the meaning. The packets sent by the other computer will either contain experimental data, experiment status information, or program state information. This function then enters this information into the "data" container class and emits a signal to the MainWindow requesting an update of the appropriate QLabel(if any). This is done by a QueuedConnection, which I understand to be required for a non-gui thread talking to the main(gui) thread.

marcel
25th July 2007, 20:01
The readyRead signal from the TcpSocket class is connected to a slot in my worker QThread which reads the buffer and interprets the meaning.

I assume you called exec() to start the event loop for the thread.

But you also have to move the thread to it's own context, such that all it's slots will execute in the thread, not in the GUI thread. This also could be a cause for freezing.
This can be done via QObject::moveToThread.


workerThread->moveToThread(workerThread);

This must be done after the worker thread has started.

Since you cannot post more code, it remains your task to examine your code further and identify possible memory leaks.

But first do this modification and test it. There is a good chance for this to work. It has been encountered before.

Regards

slcotter
25th July 2007, 20:08
Well, I can post more code, but I'm really uncertain about which parts are relevent.. there's a lot of it and I don't really think posting it all is a fair use of your time. :p I'm not a programmer by trade, so how to discuss these things isn't really clear to me.

Could you expand on what it means to "move the thread into its own context?"

slcotter
25th July 2007, 20:16
I've made your suggested change and rebuilt my code. Sadly, I won't know until I run an over-night test whether the issue remains. (How fun it is to debug a problem that doesn't show up for 12+ hours.)

And, to answer a question that I failed to with my last post, yes, I do call exec() within the run() method.

marcel
25th July 2007, 20:19
Oh, you are more like a consultant?



Could you expand on what it means to "move the thread into its own context?"

It has been said in other threads. Namely here:http://www.qtcentre.org/forum/f-qt-programming-2/t-thread-freezing-gui-7322.html, starting from post 15.

But here goes again:
You create the worker thread in the GUI thread. No problem so far.
You connect a signal from the GUI thread to a slot from the worker thread.
Here is the problem, since the signal will actually be posted as an event in the thread's event loop.
But actually, since the thread is owned by the GUI thread, the event processing for it will take place in the GUI thread. Therefore the slot will actually be executed in the context of the main thread.

By calling moveToThread from the GUI thread, you guarantee that any events posted in the thread will get executed in the thread's event handler.

Regards

slcotter
25th July 2007, 21:01
Student, actually. It's all academic applications so everything I write is in the public domain. If you want to see any part of my code that deals with.... whatever, just say "I'd like to see the parts that do <this>" and I can make it happen.

I could post it all, too, but then you'd all see what an awful amateur I am. :o

marcel
25th July 2007, 21:03
Student, actually. It's all academic applications so everything I write is in the public domain. If you want to see any part of my code that deals with.... whatever, just say "I'd like to see the parts that do <this>" and I can make it happen.

I could post it all, too, but then you'd all see what an awful amateur I am. :o

No problem. See if it works this way first...
If not, then we can look further, in the rest of the code.

Regards

slcotter
25th July 2007, 21:08
I'll be back tomorrow with an update, then.

Thanks very much for your time.

slcotter
30th July 2007, 21:36
As a delayed followup:

The suggestion failed, and the program displayed similar behavior even after being moved into its own context. However, I have worked around the problem.

I've rolled back to an older version of linux. (Scientific linux (https://www.scientificlinux.org/) 4.0, based upon some version of Enterprise Red Hat, I believe)

I'm also using older hardware (Athlon XP, versus the 64-bit dual core Athlon whateveritwas on the other system).

The code now appears to work properly. I may revisit this at some point in the future, but for now I have to move forward with other projects... presuming that it continues to function as it has over the last few days.

I can only hope that this info proves useful to someone in a similar situation in the future. It appears that my issue may be unique to the specific platform I was developing on (Fedora Core 6, 64-bit version).

marcel
30th July 2007, 21:42
As a delayed followup:

The suggestion failed, and the program displayed similar behavior even after being moved into its own context. However, I have worked around the problem.

I've rolled back to an older version of linux. (Scientific linux (https://www.scientificlinux.org/) 4.0, based upon some version of Enterprise Red Hat, I believe)

I'm also using older hardware (Athlon XP, versus the 64-bit dual core Athlon whateveritwas on the other system).

The code now appears to work properly. I may revisit this at some point in the future, but for now I have to move forward with other projects... presuming that it continues to function as it has over the last few days.

I can only hope that this info proves useful to someone in a similar situation in the future. It appears that my issue may be unique to the specific platform I was developing on (Fedora Core 6, 64-bit version).

Well, this sucks.

Since you noticed that it works ok on a single CPU machine, I assume it is a problem related to concurrency.
On the new machine, the GUI thread and the worker will never be scheduled to run concurrently, while on the dual core CPU there were a lot of chances to run in the same time.

This leads to an assumption about deadlocks and/or memory violations on shared areas.


Unfortunately, I don't have a dual core cpu, otherwise I would have tested your code.
I suggest using Intel thread checker if you ever get the chance to run it on an Intel machine.

Regards