PDA

View Full Version : Drawing lines in QPainter takes a lot of time



xcxl
4th June 2019, 12:47
Hi,

I currently have a problem with line drawing in QPainter. I draw "only" simple 2000 lines but it takes something like 50ms for each update. My problem is all the lines continuously move to draw a nice slowly rotating graph (note : it's not just a rotate()).
To check I run the following code only :


void Widget::paintEvent(QPaintEvent *event)
{
QElapsedTimer chronoPainter;

QPainter myPainter(this);
const int cst_nbLines = 5000;

chronoPainter.start();
for (int i=0 ; i<cst_nbLines ; i++)
{
QLine myLine(i/10, i/10, i/10+300, i/10);
myPainter.drawLine(myLine);
}
qint64 elapsed_ns = chronoPainter.nsecsElapsed();
qint64 elapsed_ms = elapsed_ns/1000000L;

qDebug() << "End, time to paint : " << elapsed_ms << "ms";
}


And drawing my 2000 very simple lines takes... 50ms. Not way to have a 50FPS animation with that. Have you got an idea of how I can draw all the lines faster?
I've tried with


QVector<QLine> linesVector;
for (int i=0 ; i<cst_nbLines ; i++)
{
linesVector.append(QLine(i/10, i/10, i/10+300, i/10));
}
myPainter.drawLines(linesVector);
but it not helps (same drawing time).

Thank you!

Lesiok
4th June 2019, 13:36
Did you know that in your example, you draw every line 10 times?

xcxl
4th June 2019, 13:51
Did you know that in your example, you draw every line 10 times?
Hummm no I didn't know... I just add a counter (counter+=1 just after the drawLine()), and I really have 5000 iterations. Could you explain why you said I draw it 10 times please?

Lesiok
5th June 2019, 07:45
Because i is integer for i in <0,9> i/10 == 0.

anda_skoa
7th June 2019, 08:46
QVector<QLine> linesVector;
for (int i=0 ; i<cst_nbLines ; i++)
{
linesVector.append(QLine(i/10, i/10, i/10+300, i/10));
}
myPainter.drawLines(linesVector);


Likely not directly responsible, but call QVector::reserve() before that loop.
Otherwise you could get 5000 memory realliocations.

Cheers.
_

ChristianEhrlicher
8th June 2019, 19:01
Otherwise you could get 5000 memory realliocations.

Not 5000 but also not only 1.

d_stranz
9th June 2019, 21:08
If you do not need to change the graphics with every paint event, then use a QPainterPath as a member variable of the Widget class. During the paintEvent(), all you need to do is call QPainter::drawPath(). When the graphics need to be changed, re-create the content of the QPainterPath and then call QWidget::update() if needed. For even faster performance, draw to an offscreen QImage and use QPainter::drawImage() in the paint event.

xcxl
10th June 2019, 12:35
Hi everybody, thanks for the answers!


Because i is integer for i in <0,9> i/10 == 0.

Sorry, I didn't understand your sentence. You said I draw every lines 10 times with the same coordinates, and I understood "the loop run 10x cst_nbLines".


If you do not need to change the graphics with every paint event, then use a QPainterPath as a member variable of the Widget class. During the paintEvent(), all you need to do is call QPainter::drawPath(). When the graphics need to be changed, re-create the content of the QPainterPath and then call QWidget::update() if needed. For even faster performance, draw to an offscreen QImage and use QPainter::drawImage() in the paint event.

I tried each methods (code below for any other visitors), the results are (for 5000 lines) :

- With all myPainter.drawLine(myLine) => 17ms
- With myPainter.drawLines(linesVector) and a reserve() => 17ms
- With myPainter.drawPath() => 17 ms, it looks like this fct draw every lines as before...
- With myPainter.drawImage() => <1ms for drawing the pixmap. Very fast but if a do an animation with a move of each points at each frame, it takes between 30-40ms to generate the QImage

So nothing under 17ms to draw my 5000 moving lines. I don't know why it takes so long, drawing 5000 simple points takes 1-2ms. Would it be faster to draw with OpenGL directly? 5k lines is nothing compared with complex 3D models...

The code :


const int cst_nbLines = 5000;

// TEST1
for (int i=0 ; i<cst_nbLines ; i++)
{
QLine myLine(i/10, i/10, i/10+300, i/10);
myPainter.drawLine(myLine);

}

// TEST2
QVector<QLineF> linesVector;
linesVector.reserve(cst_nbLines);
for (int i=0 ; i<cst_nbLines ; i++)
{
//linesVector.append(QLine(i/10, i/10, i/10+300, i/10));
linesVector.append(QLineF(i/10, i/10, i/10+300, i/10));
counter++;
}
myPainter.drawLines(linesVector);

// TEST3
myPainter.drawPath(myPath);

// TEST4
generateQImage();
myPainter.drawImage(QPoint(0,0), *myImage);

d_stranz
10th June 2019, 15:10
The code :

Where does this code live? In the paintEvent()? The creation of the QPainterPath and QImage for tests 3 and 4 should occur outside the paintEvent.

OpenGL would probably be faster, but you probably know that in modern OpenGL you don't do any active drawing - you create the equivalent of a scene that the rendering pipeline and shaders process to put on screen. So your programming model will be similar to that used in the Qt Graphics / View architecture.

Lesiok
10th June 2019, 15:28
Maybe I repeat myself but... This is the original code :

const int cst_nbLines = 5000;

for (int i=0 ; i<cst_nbLines ; i++)
{
QLine myLine(i/10, i/10, i/10+300, i/10);
myPainter.drawLine(myLine);
}
This is the code giving the same final result but 10 times faster :
const int cst_nbLines = 5000;

for (int i=0 ; i<cst_nbLines/10 ; i++)
{
QLine myLine(i, i, i+300, i);
myPainter.drawLine(myLine);
}

d_stranz
10th June 2019, 16:12
This is the code giving the same final result but 10 times faster :

I think the OP understands he is drawing the same line 10 times. He misunderstood your original comment (as I did when I first read it) to say that he was executing the loop 10 times (to give 50k lines). His point is that even if these were 5000 individual lines with different coordinates, the time it takes to draw all of them is too long, no matter what method he uses.

xcxl
10th June 2019, 16:13
Where does this code live? In the paintEvent()? The creation of the QPainterPath and QImage for tests 3 and 4 should occur outside the paintEvent.

OpenGL would probably be faster, but you probably know that in modern OpenGL you don't do any active drawing - you create the equivalent of a scene that the rendering pipeline and shaders process to put on screen. So your programming model will be similar to that used in the Qt Graphics / View architecture.

Yes this code is in the paintEvent (only one test at a time of course), and the creation of the QPainterPath and QImage is in the QWidget constructor. I understand the problem with OpenGL. It is just that drawing 5000 lines doesn't look a lot to me, but it may be normal for it to take 10-20ms. I thought I was doing something wrong. Maybe the best way for me to draw very fast is to modify each pixel one by one by myself on a QPixmap.


This is the code giving the same final result but 10 times faster

Yes I understand Lesiok but the goal of this example is to draw 5000 lines, in a full window (to avoid partial drawing) without scaling anything to be sure to have default parameters.
Of course the code could be :



for (int i=0 ; i<cst_nbLines ; i++)
{
linesVector.append(QLineF(double(i)/10.0, double(i)/10.0, double(i)/10.0+100.0, double(i)/10.0));
}
It gives me the same time to draw, but looks less clear to me for a basic example.

Lesiok
11th June 2019, 07:12
OK, it's clear.

xcxl
18th June 2019, 14:12
For later visitors, I found on internet some algorithms probably faster to draw all my lines pixels by pixels on a QImage.
I will try the Bresenham's line algorithm and see.

anda_skoa
18th June 2019, 15:32
I am pretty sure that is what the Qt Raster backend is using or an even more advanced algorithm.

And the raster backend uses lots of CPU extension (e.g. SSE5) were possible.

Cheers,
_

xcxl
4th July 2019, 15:24
I am pretty sure that is what the Qt Raster backend is using or an even more advanced algorithm.

And the raster backend uses lots of CPU extension (e.g. SSE5) were possible.

Cheers,
_

Hi anda_skoa, just for "fun" I tried to implement my own fast line drawing algorithm, with the hope that I can do better knowing at the start the size of line, color, etc.
But you were right, I draw everything in 40ms instead of 50ms, which is a bit better but not really different. I think I will just keep the Qt code for drawing all the lines and do with a "low FPS".

I let my code here for future visitors :


void RenderGraphThread::testFctFastLineDrawingAlgorithm (QImage &r_image, int32_t x1, int32_t y1, int32_t x2, int32_t y2)
{
int x,y,dx,dy,dx1,dy1,px,py,xe,ye,i;
dx=x2-x1;
dy=y2-y1;
dx1=std::abs(dx);
dy1=std::abs(dy);
px=2*dy1-dx1;
py=2*dx1-dy1;
if(dy1<=dx1)
{
if(dx>=0)
{
x=x1;
y=y1;
xe=x2;
}
else
{
x=x2;
y=y2;
xe=x1;
}
testFctPutPixelOnImage(r_image,x,y, 0xFF0B9F8A); // green
for(i=0;x<xe;i++)
{
x=x+1;
if(px<0)
{
px=px+2*dy1;
}
else
{
if((dx<0 && dy<0) || (dx>0 && dy>0))
{
y=y+1;
}
else
{
y=y-1;
}
px=px+2*(dy1-dx1);
}
testFctPutPixelOnImage(r_image,x,y, 0xFF0B9F8A); // green
}
}
else
{
if(dy>=0)
{
x=x1;
y=y1;
ye=y2;
}
else
{
x=x2;
y=y2;
ye=y1;
}
testFctPutPixelOnImage(r_image,x,y, 0xFF0B9F8A); // green
for(i=0;y<ye;i++)
{
y=y+1;
if(py<=0)
{
py=py+2*dx1;
}
else
{
if ((dx<0 && dy<0) || (dx>0 && dy>0))
{
x=x+1;
}
else
{
x=x-1;
}
py=py+2*(dx1-dy1);
}
testFctPutPixelOnImage(r_image,x,y, 0xFF0B9F8A); // green
}
}
}

// TEST CODE FOR FAST DRAWING LINES
void RenderGraphThread::testFctPutPixelOnImage(QImage &r_image, const int32_t x, const int32_t y, const uint32_t _color)
{
if (y<r_image.height() && x<r_image.width() && x>=0 && y>=0)
{
uchar *pFirstLine = r_image.bits();
int32_t depth = 4;
QRgb* rgbpixel = reinterpret_cast<QRgb*>(pFirstLine + r_image.width()*depth*y + x*depth);
*rgbpixel = _color;
}
}

xcxl
18th January 2022, 10:43
Hi everybody,

3 years after, I come back to give my benchmark results, and my answer to my own problem. I hope it will give to the next dev some ideas.

A small synthesis of my initial problem : I have to draw 5000 moving lines on the screen, with an update of line size at each frame (to perform a "moving network animation").

I tried 6 ways to draw a lot of lines (10 000 to 50 000 lines) on screen to find the lowest cost for CPU/GPU. Here are the results :

TEST1 - Direct painting with QPainter.drawline() :
=> 8.8 FPS for 25 000 lines
=> 20.0 FPS for 10 000 lines
Note : a bit laggy for a nice animation

TEST2 - Direct painting with QPainter.drawlines(), so calculation and then draw all at once :
=> 9.0 FPS for 25 000 lines
=> 21.0 FPS for 10 000 lines

TEST3 - Draw everything on a QImage and then copy the image to screen
=> 8.5 FPS for 25 000 lines
=> 19.7 FPS for 10 000 lines

TEST4 - Use a QScene to create all the line items, then just move them a each paintEvent()
=> 0.5 FPS for 2 000 lines
=> 4.9 FPS for 1 000 lines
Remark : a lot worse

TEST 5 - Using a QOpenGLWidget, and draw all lines with a GLSL fragment shader
=> 5.0 FPS for 2 000 lines
=> 10.0 FPS for 1 000 lines
Remark : no so good, fragment shader are apparently not made for big a big loop

TEST 6 - Using a QOpenGLWidget, and draw all lines with a very simple GLSL vertex & fragment shader (based on HelloGL2 example), updating coordinates of each lines in a QOpenGLBuffer (=Uniform Buffer Object in GLSL terms)
=> 40.0 FPS, GPU 32% for 5 000 lines
=> 35.0 FPS, GPU 37% for 10 000 lines
=> 35.0 FPS, GPU 62% for 25 000 lines
=> 30.0 FPS, GPU 90% for 50 000 lines
Note1 : the Uniform Buffer allows me to draw more than 4096 lines, the limit on my GPU for uniform fixed arrays. It was my first try and fail with glsl.
Note2 : CPU is less than 20% for all this TEST6.

This last TEST6 offered me enough performance (even on a small intel GPU HDRaphics 520), so I will stop there and use this method now.


Have a good day.

d_stranz
18th January 2022, 19:37
Thanks for coming back to update your post!


TEST 6 - Using a QOpenGLWidget, and draw all lines with a very simple GLSL vertex & fragment shader

These results makes sense since most of the work is being done on the GPU, which is optimized for just this kind of thing. Glad you could get the performance you needed.

ghorwin
25th January 2022, 07:15
Thanks for the insight: quick question, could you share the code piece where you update/transfer the buffer to the GPU? This might be another bottleneck on modern hardware architectures.

For example, you can use the classical:


m_vertexBufferObject.bind();
m_vertexBufferObject.allocate(m_vertexBufferData.d ata(), m_vertexBufferData.size()*sizeof(Vertex));
m_vertexBufferObject.release();


where the memory for the buffer is newly allocated. Or you could map the GPU memory to CPU memory like:



auto ptr = m_colorBufferObject.mapRange(0, m_colorBufferData.size() * sizeof(ColorRGBA),
QOpenGLBuffer::RangeInvalidateBuffer | QOpenGLBuffer::RangeWrite);
std::memcpy(ptr, m_colorBufferData.data(), m_colorBufferData.size()*sizeof(ColorRGBA));
m_colorBufferObject.unmap();


I'd be interested to know how that affects performence.

-Andreas

d_stranz
25th January 2022, 15:26
could you share the code piece where you update/transfer the buffer to the GPU?

Oops, sorry if I gave the impression that I have code that does this - from my reading of the OpenGL pipeline I have learned that once you get all of the data transferred to the GPU, and the shaders set up, the GPU pretty much takes over. It is optimized to do computations in parallel, so you can get huge speedups over doing things point by point or line by line in the CPU.

xcxl
12th February 2022, 18:29
As you said, I currently use the classical :


// Setup our vertex buffer object.
m_myVbo.create();
m_myVbo.bind();
m_myVbo.allocate(m_vertexData.constData(), m_floatCount * static_cast<int>(sizeof(GLfloat)));
m_myVbo.release();
Sorry I don't actually have the skills to understand the second option with the CPU/GPU mapping

The full project, configured for the GPU test (test 6) is available here : 13741
The VBO setup is in myGLwidget_vertAndFrag.cpp at line 269.

Could you send me the modif to do to test this option ? I'm interested too.

xcxl
13th February 2022, 17:59
Ok, I take my courage to understand the new method, and it was finally not the hard.
I replaced :


m_myVbo.bind();
m_myVbo.write(0, m_vertexData.constData(), m_floatCount * static_cast<int>(sizeof(GLfloat)));
glDrawArrays(GL_TRIANGLES, 0, m_vertexCount);
m_myVbo.release();

by :


m_myVbo.bind();
auto ptr = m_myVbo.mapRange(0, m_vertexData.size()*int(sizeof(GLfloat)),
QOpenGLBuffer::RangeInvalidateBuffer | QOpenGLBuffer::RangeWrite);
memcpy(ptr, m_vertexData.data(), size_t(m_vertexData.size())*sizeof(GLfloat));
m_myVbo.unmap();

glDrawArrays(GL_TRIANGLES, 0, m_vertexCount);
m_myVbo.release();

But the performances were quite the same ; can't see the difference on GPU monitoring (load of around 90% for 50 000 lines).