PDA

View Full Version : Render 2d Density graph with OpenGL



kkaz
4th December 2017, 09:36
I want to render a 2d array of values as a density graph, i.e the color of each pixel in the X-Y range is determined based on the color margin in the Z axis.

I am able to do so using Qwt and more specifically QwtPlotSpectrogram. However, i want constant changes on the data values to be repainted, which means constant reploting of the graph. I am able to achieve maximum 10 refreshes per second, before the GUI starts to hang and become non responsive.

What is the best way to use OpenGL to render the density graph based on these examples: http://doc.qt.io/qt-5/qtopengl-2dpainting-example.html or straight up OpenGL http://doc.qt.io/qt-5/qtopengl-cube-example.html

Thanks in advance

d_stranz
4th December 2017, 17:18
OpenGL will probably be faster since many of the graphics operations are pushed into the GPU. Neither the cube nor 2D painting examples are good ones for your problem because the images being painted are static - the data that makes up the image or texture is created once, and OpenGL is then called on to handle transformations as the cube is rotated.

In your case, the bottleneck is going to be moving the data so the image can be rebuilt on the fly. I would look for examples from GPU accelerated multimedia or gaming frameworks. Think also of using QImage to represent your density map instead of point-by-point calculations. I think Qwt's spectrogram may also do interpolation, which will really slow things down.

kkaz
5th December 2017, 08:35
Thanks for the answer, Qwt's Spectrogram does use interpolation but i replaced it with a fast nearest neighbor.

However, Qt is the common ground here, which means that i have to use something that can be rendered inside a Qt window. I don't know any libraries for GPU acceleration or gaming frameworks, but i guess i can write pure CUDA to transform the 2d vector of values into something that can be used with
QImage(uchar *data, int width, int height, Format format, QImageCleanupFunction cleanupFunction = Q_NULLPTR, void *cleanupInfo = Q_NULLPTR) and then paint the QImage into a QFrame? Is that something that's worth trying?

Uwe
5th December 2017, 09:05
I think Qwt's spectrogram may also do interpolation, which will really slow things down.
There is always some sort of resampling required, but if this is done by interpolation or faster algos like nearest neigbor is completely up to the user.

In qwt_plot_spectrogram.cpp ( SVN trunk ) is a define DEBUG_RENDER that can be enabled to see how long it takes to create the image. With the spectrogram example ( in release mode ) I see results about 12ms for a resolution of 1813x1033 pixels on my box ( Intel(R) Core(TM) i7-3770T CPU @ 2.50GHz ), when using a color table of 16384 colors. Reducing the resolution of the image has of course a significant effect on the performance: f.e. for something like 493x307 I can see values about ~1ms.

So I would recommend to see how long the spectrogram example needs on the target hardware and compare it to the time needed in the application. If there is a significant difference it is usually the fault of the resampling process, what is part of the application code and it might be worth to check if something can be done there.

Uwe

d_stranz
5th December 2017, 16:49
then paint the QImage into a QFrame? Is that something that's worth trying?

If you can do your image calculations on the CUDA side, then do everything there (in a shader, for example). The thing you want to avoid in any CUDA programming is moving data back and forth between CPU and GPU. The rule is move it to the GPU and keep it there. If you do need to move data, then do it using cudaMemcpy() in large blocks rather than point-by-point.

Of course, this does break the hardware and OS independence you get from Qt, but you implied that your goal is more performance oriented.

Uwe
6th December 2017, 07:30
The thing you want to avoid in any CUDA programming is moving data back and forth between CPU and GPU. The rule is move it to the GPU and keep it there.
As far as I understood the situation is about data that changes frequently, what would be the opposite of "keep it there". And I would expect, that preparing the data that can be uploaded to the GPU for each frame is not faster than creating the image itself - it requires the same steps beside of the final mapping of each pixel into a RGB value.

And a GPU might have more cores to do things in parallel, but beside that I don't see why it should be faster than the CPU for this situation.

@kkaz:


if you can show your implementation of YourRasterData::value() we could check together if there is something, that can be improved
Is the resolution of your data below the resolution of the image on screen and you forgot to implement YourRasterData::pixelHint



Uwe

d_stranz
6th December 2017, 20:29
As far as I understood the situation is about data that changes frequently, what would be the opposite of "keep it there". And I would expect, that preparing the data that can be uploaded to the GPU for each frame is not faster than creating the image itself - it requires the same steps beside of the final mapping of each pixel into a RGB value.

And a GPU might have more cores to do things in parallel, but beside that I don't see why it should be faster than the CPU for this situation.

Ah, my point was that if you can do the image calculation on the GPU side, then there is no need to move the data around. If you can't do that, then there may be no performance difference. The OP doesn't state whether his data is coming from an external source or is being computed.

One thing that could be faster if performed on the GPU is bilinear interpolation for resampling, and there are published algorithms for that.

kkaz
7th December 2017, 12:53
The data come from an external source, which means that the basic flow is the following:
A separate thread:
1. Wait for data from external device (More than 70 measurements per second)
2. Process data (This is of minor importance. ~2-3 ms)
3. Flush all data to disk without waiting to actually be flushed
3. Store the processed data in a 2d double array "ToRenderStructObject"

and the main GUI thread will paint the "toRenderStructObject" every once in a while, meaning that i am not losing painted data, but i don't paint them as they come. Painting requires to set the data to the SpectrogramData object that overrides the value() function, set some intervals, and call replot()

The resolution of the data is relatively high, 1080x500 in one occasion, and 2000x2000 in another occasion, and they are painted in rather small resolution widgets-graphs since they can't both fit in the screen.
I have not re implemented pixelHint(). Is that something that would help me?

My implementation of the value() function:

double Spectrogram::SpectrogramData::value(double x, double y) const {

/* X represents the column number in our data, and Y the row */
double column = x;
double row = y;

QwtInterval interval_x = interval(Qt::XAxis);
QwtInterval interval_y = interval(Qt::YAxis);

if (x > interval_x.maxValue() || x < interval_x.minValue()) {
/* This should not be happening */
return std::numeric_limits<double>::quiet_NaN();
}
if ((y > interval_y.maxValue()) || (y < interval_y.minValue())) {
/* This should not be happening */
return std::numeric_limits<double>::quiet_NaN();
}

/* Linear interpolation to map the values to our data range */
double index_x = 0 + (data_.size() - 1 - 0) * ((row - interval_y.minValue()) / (interval_y.maxValue() - interval_y.minValue()));
double index_y = 0 + (data_.at(0).size() -1 - 0) * ((column - interval_x.minValue()) / (interval_x.maxValue() - interval_x.minValue()));

if (index_x < 0 || index_x >= data_.size()) {
dt::ConsoleInfoL(dt::CRITICAL, "Spectrogram rendering, index_x out of boundaries");
}
if (index_y < 0 || index_y >= data_.at(0).size()) {
dt::ConsoleInfoL(dt::CRITICAL, "Spectrogram rendering, index_y out of boundaries");
}

/* Pick the nearest neighbour */
return data_[std::round(index_x)][std::round(index_y)];
}

So the the question about CUDA was, if i squeeze into the separate thread a CUDA QImage building algorithm, and inside the main GUI thread paint the image into the Qframe, if it would be any faster. There is no required goal of refreshes per second that needs to be achieved, just wondering if i could squeeze anywhere a little gpu time.

Uwe
7th December 2017, 15:41
Store the processed data in a 2d double array "ToRenderStructObject"

Could you please also show the definition of ToRenderStructObject and did you check how much time it takes to fill it ?
What type of color map do you use ?
Did you enable DEBUG_RENDER and what values do you see ?
And are you using more than one core ?




The resolution of the data is relatively high, 1080x500 in one occasion, and 2000x2000 in another occasion, and they are painted in rather small resolution widgets-graphs since they can't both fit in the screen.
I have not re implemented pixelHint(). Is that something that would help me?
The image is rendered in the minimum of the resolutions of data/screen. But without implementing the pixelHint() the resolution of the data is not known and it will always end up in screen resolution.
Implementing a pixelHint helps, when the data resolution is below the resolution of the plot canvas ( = size in pixels ) - so in your case probably not.


My implementation of the value() function:
This method is called for every pixel - for a resolution of 1000x1000 this will be a million times and removing any pointless operation from it will have a significant effect.



Better do these "will never happen" checks in debug mode only.
Everything that can be calculated in advance should be done in YourRasterData::initRaster. F.e something like "(data_.size() - 1) * interval_y.maxValue() - interval_y.minValue()" is constant and does not need to be calculated again and again.
Don't use std::round() - casting to int should be good enough.


But the implementation is not totally bad and accessing data_ might be the bottleneck.

Why don't you simply use QwtPlotMatrixData - or at least copy the code and strip it down to the minimum you need ? It is faster than your code and the main reasons for not using it ( calculate the values on the fly or returning values from the original buffer without having an extra copy ) seem to be irrelevant in your situation.

Concerning using the GPU: as long as you don't have an OpenGL canvas you would need to:



prepare data
upload data
create something with the GPU
download the result
translate it into an image
draw the image


Uwe

kkaz
8th December 2017, 09:06
In one case the the external source produces 1d vectors, so the ToRenderStruct is just a 2d vector with older measurements that i shift whenever a new one comes (using the vector assignment constructor), and in the other case the external source produces 2d array data, so i just use the assignment constructor to set the 2d vector. So the separate data gathering and processing thread always produces a 2d vector and the main gui thread renders that 2d vector. I have not measured how much time it takes to fill it, but it must be irrelevant to the rendering process, since it's done on the separate data gathering thread and this thread is able to achieve even 100 measurements per second.

The color map is use:


color_map_ = new QwtLinearColorMap(QColor(0, 0, 141), QColor(131, 0, 0));
color_map_->addColorStop(8.0 / 64.0, QColor(0, 0, 252));
color_map_->addColorStop(24.0 / 64.0, QColor(0, 254, 254));
color_map_->addColorStop(40.0 / 64.0, QColor(255, 252, 0));
color_map_->addColorStop(56.0 / 64.0, QColor(254, 0, 1));
...
QwtPlotSpectrogram * spectrogram = ...
...
spectrogram->setColorMap(...)


With DEBUG_RENDER i get, renderImage QSize(489, 193) 15, for a 2d vector with size 700*1280

I run on a i5 750 processor, but the final version will run on a i7 U laptop processor.

The main reason for not using QwtMatrixRasterData is that i have to use setValueMatrix() in order to set the data, which requires a QVector() with 1d configuration of the 2d array which provides a big copy overhead given that everything else in our code uses std vectors and 2d std vectors.

I think i'll leave the gpu for the moment, and stick to qwt plots, since it's more important to have accurate axes and names and stuff which would require significant amount of time to produce with a custom CUDA-created image. I will try to optimize value() with your suggestions.

Uwe
8th December 2017, 12:04
With DEBUG_RENDER i get, renderImage QSize(489, 193) 15
Considering, that this only for 489x193 pixels ( 5% of the number of pixels in my test ) this doesn't sound that good - so it might be worth to do optimizations for your value() method.

On the other hand you wrote, that you run into trouble with 10Hz, what indicates, that updates take >= 100ms. But considering, that you have 2 plots we would have 30ms only.
So there are > 70ms you lose somewhere else !

Uwe