PDA

View Full Version : Performance optimizations using Raster backend



tothphu
8th December 2015, 02:30
Hi Everyone,

I'm trying to optimize a Qt application, but came across a few roadblocks along the way and I'm wondering if you can shed some light on them.

Some parameters of what we do: raster engine on barebone linux writing to Framebuffer (RGB32 destination I believe, I'm open to changing it if you can tell me how. Documentation is scarce.). Qt 4.7.1 (can't change at this very moment, but it is planned to move on). 400MHz CPU. We only use drawPixmap calls to draw images, and all QImage-s are converted to QPixmap outside of paintevents. There is also a screen update per second.

I've read these pages:
http://blog.qt.io/blog/2009/12/16/qt-graphics-and-performance-an-overview/
http://blog.qt.io/blog/2009/12/18/qt-graphics-and-performance-the-raster-engine/
and alike, including docs.

- we use many Pixmaps with alphablending. In order to improve performance I've tried to change our QImage-s (from ARGB32) to all use ARGB32_Premultiplied or RGB16 (which I believe is ARGB4444_Premultiplied in Qt). The problem I have is that my draw times actually increased, instead of decreasing for both cases. When I do callgrind I see /src/gui/painting/qdrawhelper_p.h:void qt_transform_image_rasterize<unsigned int, unsigned int, Blend_ARGB32_on_ARGB32_SourceAlpha> calls outnumbering everything else we do. When I convert an ARGB32 QImage to QPixmap, what is the QPixmap's structure? Is the answer the same when my QImage is ARGB4444_Premultiplied?
- as I use a raster device, is there a difference between using drawImage vs drawPixmap? I understand that if I'd have OpenGL QPixmap would be in GPU memory, but not in this case. I'm already underway with testing this, but can I reasonably expect anything?
- I'm wondering about best practices to use masks. When would you get the mask from a pixmap, create a HeuristicMask or AlphaMask? As expected I'm interested in speed. Also when would you a mask?
- I have a couple of CONFIG options enabled for FB device like CONFIG_FB_CFB_FILLRECT=y, does Qt use those?
- http://doc.qt.io/qt-4.8/qpixmap.html#fromImage says: "If this is too expensive an operation, you can use QBitmap::fromImage() instead." What does it mean by that? Why is QBitmap faster and what do I draw where in order to improve speed?
- when I rotate the canvas I can observe that the time to draw depends on the angle of rotation, 0 & 180 degrees being the fastest and 90&270 being the slowest (5 times slower), is that expected? I can craft a theory that explains it, rows an columns change, which is quite problematic in memory and cache effects screw everything up, but I'm wondering if it is correct.
- any ideas about potential speed up possible with Qt 4.7.1 -> 5.5 change? (it is a bit of a mission to invest all that time to make the change)

I don't think it makes a difference whether I'm including code snippets. Ask questions and I can answer those.

Any help is appreciated!

Thanks,
Peter

anda_skoa
8th December 2015, 12:07
When I convert an ARGB32 QImage to QPixmap, what is the QPixmap's structure? Is the answer the same when my QImage is ARGB4444_Premultiplied?

The QPixmap internals are very system specific. For the raster graphics system they might be the same.



- as I use a raster device, is there a difference between using drawImage vs drawPixmap? I understand that if I'd have OpenGL QPixmap would be in GPU memory, but not in this case. I'm already underway with testing this, but can I reasonably expect anything?

Originally the QPixmap object was meant to hold a system specific image buffer, e.g. a X11 Pixmap on X11. potentially living in the process of the windowing system itself, accessed by the application through means like shared memory.

For raster it could actually be just another image, so drawImage might be more efficient since it doesn't require conversion.

But as always with performance the only way to tell is to benchmark.



- any ideas about potential speed up possible with Qt 4.7.1 -> 5.5 change? (it is a bit of a mission to invest all that time to make the change)

While I think the the raster implementation in qt5 is more optimized (due to being the default for all platforms now), you should make your own profiling and benchmarks.
Since you already need a benchmark program for the drawImage vs drawPixmap case, it shouldn't be too difficult to make it also built with Qt5 and get some real world numbers for that case as well.

Cheers,
_

tothphu
8th December 2015, 22:40
Do you mean that the backend might even use ARGB32 not ARGB32_Premultiplied? Although switching everything to ARB32_PM I've managed to reduce calls to convert_ARGB_to_ARGB_PM, but it didn't seem to make a difference in runtimes. I did a bit of searching in Qt source and it seems that only ARGB32_PM format has blending functions (except non transparent formats), which means that no matter what I do it will convert everything to that format. The best I can do is always use ARGB32_PM formats from the beginning. It seems to be same for 5.5 expect there is an A2RGB30 format, which seems to good addition for my case.

My profiling is quite tricky because I can't do profiling on the target, but I need to do profiling on a different Qt version on X11.... I can do time measuring of some calls.

I guess drawImage wouldn't make a difference, as the only way to blend images AFAIK is through drawImage calls, which would use the same backend blending functions.