Open MP on Windows with MingW32 almost no acceleration in release mode.
Hello,
I created a Console Project to test Open MP inside QtCreator/MingW32. My Qt version is 5.3. The compiler is mingw482_32.
Here is the project file
Code:
#-------------------------------------------------
#
# Project created by QtCreator 2021-06-18T17:05:54
#
#-------------------------------------------------
QT += core
QT -= gui
TARGET = OpenMPTest
CONFIG += console
CONFIG -= app_bundle
TEMPLATE = app
win32:CONFIG(release, debug|release):QMAKE_CXXFLAGS += -std=c++11 -O3
QMAKE_CXXFLAGS+= -openmp
QMAKE_LFLAGS += -fopenmp
SOURCES += main.cpp \
testOpenMP.cpp
HEADERS += \
testOpenMP.h
And here is the content of testOpenMP.cpp:
Code:
#include <iostream>
#include <math.h>
#include <time.h>
#include <omp.h>
#define SIZE_ARRAY 20000
int array_floor_total[SIZE_ARRAY];
void do_compute(int j)
{
double total = 0;
for (int i = 0; i < SIZE_ARRAY; ++i)
total += sqrt(i+j);
int floor_total;
floor_total = floor(total);
array_floor_total[j] = floor_total % 2;// the various threads need to write to common memory
}
void test_accellerate_loop()
{
int end = SIZE_ARRAY;
clock_t t1 = clock();
for (int i = 0; i < end; ++i)
do_compute(i);
clock_t t2 = clock();
std::cout << "time taken (no acceleration)"<<t2 -t1<<"\n";
clock_t t3 = clock();
#pragma omp parallel for // This OMP directive tells the compiler to parallelise the next loop
for (int i = 0; i < end; ++i)
do_compute(i);
clock_t t4 = clock();
std::cout << "time taken (with acceleration)" << t4 - t3 << "\n";
std::cout << "Press return\n";
getchar();// pause
}
The main file is essentially a call to test_accellerate_loop();
Now here's the unexpected fact:
Whereas the acceleration is considerable in Debug mode, it's almost non existent in Release mode.
Here is one MignW32 output
time taken (no acceleration)840
time taken (with acceleration)847
I compiled the same testOpenMp.cpp in Visual Studio. There is a big difference. And outside VS the program runs even much faster.
Visual Studio executable, at the command line.
time taken (no acceleration)1076
time taken (with acceleration)203
Any explanation? Do I have the wrong optimiser flags?
Re: Open MP on Windows with MingW32 almost no acceleration in release mode.
This is not a Qt Programming question, so I have moved your thread to the General Programming section.
There are many explanations I can think of:
1 - g++ is very good at optimizing and parallelizing loops, so OpenMP doesn't add much advantage
2 - g++ has optimized and inlined your simple function.
3 - the OpenMP implementation in g++ has significant overhead which does not result in much performance increase for a small number of evaluations of a simple calculation.
4 - MSVC is not very good at optimizing loops.
5 - MSVC did not inline your function and left it as a function call
6 - MSVC produces slower code so OpenMP has a performance boost
7 - differences in the compilation flags for g++ vs. MSVC gave different degrees of optimization of the non-OpenMP code.
Also, most compilers turn off optimization in debug mode, so using a debug mode build to test performance isn't really valid. Depending on the compiler, optimization can inline function calls, unroll loops, or make other changes so that the release and debug mode versions of the program aren't really the same.
I think you need to first research the compiler flags to make sure your non-OpenMP code is built on as close to an apples-to-apples basis as possible so you really do have the same starting point for comparison. And second, you need to make your test calculation more difficult, so it takes longer to execute and can't be optimized by the compiler, and you need to evaluate it maybe millions of times, not just 20000.
I don't know if it is the case, but because you are accessing a global array in your compute function, there could also be some access control locking that basically defeats the parallelization in g++.