PDA

View Full Version : Qt application hangs in xcb_wait_for_reply



ednakram
24th June 2017, 09:31
Hi All,
I am working on a Qt 4.6 based application on RedHat Linux 6 that shows different stuff in a main window -a tree, a table, text edit etc. and in the status bar I have recently added a "composite widget" to control and view application details. This composite widget in the status bar - is composed of a widget that shows a progress bar and a bunch of QLabels showing some numbers.

I constantly the sub-widgets inside the composite widget as the application receives some notifications over socket from another process - updates are delayed with a single shot timer of 0.5s.
When the application runs for long times >4 hours minimum, the new widget that I added is causing a slowdown.
It crawls on doing any movement - in all cases it is stuck in xcb_wait_for_reply, as seen in the attached stack trace (was too big to paste here - so made an attachment).

I did an experiment - QLabel->setText in a loop (1 million times) and I would run into the same issue in about 500K times.
If I replace the QLabel with QPainter::drawText then I would run into the same issue after about 2 millions times.

To confirm this info - I removed the QLabel->setText from my composite wigdet - but it didnt resolve the problem. If I remove the progress bar (paint event - more on this progress bar later) - doesnt resolve the issue.
Only if I remove BOTH the QLabel->setText and comment the paint event of my composite widgets progress bar (which is a widget derived from QFrame in which I have a custom paint event to draw 4 rectangles with different colors) - do the things resolve i.e. no hang in the test case.

What could be going wrong here - is this something to do with the X11 session getting bogged down by too many "draw" calls - from the QLabel and QPaint::drawText at the same time? Why the combination of these two together causes an issue? Is this something do with the Qtimer - single shot timer? Even when all notifications are done, application is sitting idle doing nothing - and I minimize and maximize it - it hangs with the same stack trace below!

Any clues would be very helpful!

best,
Ednakram

ednakram
26th June 2017, 06:17
Bump up.

Additional question, I am using QT 4.6.3, should this even be using xcb library? I think XCB library was not a dependency with this version of Qt as seen here: https://doc.qt.io/archives/4.6/requirements-x11.html

ednakram
28th June 2017, 01:52
Any clues ?

d_stranz
28th June 2017, 18:11
I think XCB is probably a red herring. This sounds like a classic case of a cumulative memory leak which eventually results in your app being unable to acquire more heap from the OS. It could be the problem originates in XCB because it is allocating memory, but the solution is really on your app side if you aren't deleting it as required.

ednakram
9th September 2017, 00:58
Hi @d_stranz, Thanks for your comments. I may well be doing something wrong in the code, but I dont know what.
Do note few things:
1. This starts to happen when the application is running for a long time - ~40 hours.
2. There is plenty of free memory available on the machine. So heap is not a problem.
3. During the 40 hours run, my application is just taking 90 seconds of CPU time, so it is not doing a lot of processing.

4. The problem is with the QMainWindow backing store - because this portion always appears in many stack traces

#41 0x00002aadb0112180 in QWidgetPrivate::syncBackingStore() () from /path/lib/libQtGui.so.4
#42 0x00002aadb0121022 in QWidget::event(QEvent*) () from /path//lib/libQtGui.so.4
#43 0x00002aadb0485013 in QMainWindow::event(QEvent*) () from /path//lib/libQtGui.so.4
#44 0x00002aadb00d508c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /path/Linux64/lib/libQtGui.so.4

Once this problem has started, even if I stop all processing in my application, hide all the widgets except for status bar, menu item etc. even then a simple thing like clicking anywhere or resizing a splitter will take 3-4 seconds with stack trace having the same snippet as above.

From the code above I can see that the QMainWindow is getting an QEvent::UpdateRequest - this happen even when I click on a menu item and the window is always on top and not hiding and showing up.
Why should it be issuing an update request when nothing has changed?

Could this be a QT BUG that is triggered by my application?
Any more pointers to nail this down?

d_stranz
9th September 2017, 18:10
Could this be a QT BUG that is triggered by my application?

No. As I said before, it's either a memory leak or something that is acting like one - a list of things (like a message log) that is kept in memory and grows and grows. Some error (in your code) that results in some large data structure being copied repeatedly when used as an argument to a function (passed by value) instead of being passed as a reference or a pointer.

The backing store is a red herring, too. It is a symptom of the problem, not the cause. The cause is something you are doing in your code. If the backing store means that Qt is paging out RAM into a backup file on disk because it needs that RAM for something else, then that will cause an orders of magnitude slowdown. And if the backing store is required to do that, then again it is pointing to a memory leak in your code.

I am not that familiar with memory management on linux platforms. In Windows if a program asks for more heap than is available, the OS begins allocating memory from the page file (on disk). If linux implements a similar virtual address space where you can ask for as much as you want, then you may not notice any decrease in the available heap because the OS has given your program all the RAM it is willing to give it and future requests are made from a virtual address page file on disk. With a memory leak, eventually your program becomes "swap bound", where the OS spends most of its time swapping RAM in and out of the page file as your program asks for things that aren't in RAM at that moment.

There are memory checkers for linux development environments, aren't there? Instrument your code with it, run it for a few hours, and see what it says about where memory is being allocated and (not) deallocated. You don't need to run it for 40 hours, because whatever problem there is will occur over the short term as well, it just won't have an obvious effect on the performance.

ednakram
13th September 2017, 02:16
Thank you @d_stranz. I will investigate the memory leak aspect.

So as I understand your hypothesis is that the backing store is always involved in paint events on the widget, just in this case it gets "slow" because it is being stored on the disk instead of memory because of swapping.

d_stranz
13th September 2017, 18:25
I am guessing, but it could be that the backing store is probably used to keep a bitmap image of the window so it can be quickly repainted when some part of it is covered by another window or GUI item (Like moving the mouse cursor across a window - as the mouse moves, the areas it covers have to be repainted after it moves on. It is simplest to keep a copy of the window and BITBLT it back into place when needed).

If the backing store can be kept in RAM, this repainting is unnoticeable because it happens so quickly. If it has to be swapped in from disk, it will be very slow.

ednakram
23rd September 2017, 08:34
HI @d_stranz,
I found the reason for the slowdown - let me first thank you for pointing to the "possibility" of memory leaks.

The problem was not "exactly" a memory leak in the application but in the XServer.

I had a widget that was derived from QFrame but had it's own "paint" event as well which is as follows (paintProgress was basically drawing 2 colored rectangles to show progress).




void paintEvent(QPaintEvent *pe)
{
QPainter painter(this);

painter.save();
painter.translate(rect().left(), rect().top());

paintProgress(&painter, w, h, margin, m_data);

painter.restore();

QFrame::paintEvent(pe);
}


I used xtrace to trace out all the communication happening between the client (machine on which application runs) and the XServer (machine on which the XServer i.e. VNC server is running).
IT is a whole lot of data - but the only thing I did was count the number of CreateGC and FreeGC - the two functions to create a graphic context and free it. Running my progress bar for just
1 seconds - results in calling these functions ~7000 times, the only problem it is not exactly the same number FreeGC is called a 100 times less than CreateGC. Which is a memory leak on the XServer.

I change this widget to drive from QWidget instead of QFrame and let it own the painting process completely i.e. call setAttribute(Qt::WA_OpaquePaintEvent, true);
When I ran with patch - the number of CreateGC and FreeGC matches EXACTLY, just as they match if I completely "hide" the widget too i.e. no painting is happening.
The hypothesis I have is that if the widget is NOT painting OPAQUELY, the paint engine first "clears" the widget every time it has to paint the widget - this somehow causes a GC to not to be freed ( I dont know exactly how, yet).
What do you think?

But with this change I wrote about above, I have been running the application for ~24 hours without any hang, hopefully it would stay that way for more than that time.

Thanks a ton again!

d_stranz
23rd September 2017, 21:04
What do you think?

No idea - I have no experience with XServer. But great detective work, and I am glad you were able to find and fix the problem.

To avoid the "clearing" problem, try calling QWidget::setAutoFillBackground() with false in your widget's constructor.

I also don't think you should be calling the base class paintEvent() from within your own paintEvent() (either QWidget:: or QFrame:: paintEvent()). I think paint events are supposed to be more or less self-contained. You might try going back to your original code and trying the same thing with the QFrame paintEvent call commented out to see if you get the same set of unmatched GC calls.

ednakram
25th September 2017, 01:51
Hi @d_stranz, you mean to setAutoFillBackground to true? Because default is already false.

Added after 6 minutes:

By the way, the attribute setAttribute(Qt::WA_OpaquePaintEvent, true); has a side-effect - it causes some artifacts to be left when resizing the window etc.

I removed this, and still the counts of CreateGC and FreeGC remain the same - so the problem was actually with calling the QFrame::paintEvent from my widget - I was doing it to "raise" the frame when mouse hover over the widget, I will do this in another way now. I didnt know that this could cause problems - calling base class paint event from a derived class - I had seen many examples where this was being done.

d_stranz
25th September 2017, 16:58
you mean to setAutoFillBackground to true? Because default is already false.

Huh, my mistake. I thought the default is "true".


I had seen many examples where this was being done.

I have seen it with non-paintEvent() methods, but it is rare to see it in a paintEvent(). If you do see it, usually it is the first call in the paintEvent() handler so that the custom widget can paint over areas painted by the base class. If you call it last, there is the possibility that it will simply wipe out your own painting. Depends on the widget, of course.