Page 1 of 2 12 LastLast
Results 1 to 20 of 36

Thread: Maximum Speed Needed

  1. #1
    Join Date
    Sep 2008
    Posts
    60
    Thanks
    20
    Thanked 3 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Maximum Speed Needed

    Hi everybody,

    I'm creating a mathematical application that has a very long calculation period, on windows using VC++ Express compiler + .Net Framework my application need 3min to finish (using the BackGroundWorker that is something like QThread).

    Well now I restart my project using Qt4, I created a QThread, but unfortunatelly when I compile it in release mode, the needed time is > 15min, (on Linux and Windows).

    So, is there any way to speed it at the maximum, using all CPUs. Is there some QMAKE option that allow this kind of performance.

    Absolutely thank you very much for any kind of help,
    Cheers,
    Louis

  2. #2
    Join Date
    Dec 2006
    Posts
    849
    Thanks
    6
    Thanked 163 Times in 151 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Maximum Speed Needed

    have you
    * checked that you have turned on compiler optimizations?
    * perhaps changed your code in other ways (is the algorithm really the same)?
    * checked if BackGroundWorker perhaps is able to split the work not into one thread but perhaps a whole bunch of threads? (I.e. check if the other app has more than 2 threads running.)

  3. #3
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    Are you talking about compilation time or execution time?
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  4. #4
    Join Date
    Sep 2008
    Posts
    60
    Thanks
    20
    Thanked 3 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Hi caduel and wysota

    >Are you talking about compilation time or execution time?
    Execution time.

    > * checked that you have turned on compiler optimizations?
    Well, on my Qt4 project, I put CONFIG+=release, and when it compiles it add -O2 option to the compiler.

    > * perhaps changed your code in other ways (is the algorithm really the same)?
    The alogrithm is the same (copy/paste).

    > * checked if BackGroundWorker perhaps is able to split the work not into one thread but perhaps a whole bunch of threads? (I.e. check if the other app has more than 2 threads running.)

    No, it's only 1 thread.

    Thank you very much for all attention.
    Cheers,
    Louis

  5. #5
    Join Date
    Jan 2006
    Posts
    132
    Thanked 16 Times in 16 Posts
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Maximum Speed Needed

    Qt can also be compiled with VisualStudio, that should give you the same speed.

    I'm not sure if it is compilable with Intels C++ compiler, but that one would probably be even better thatn microsofts.

  6. #6
    Join Date
    Sep 2008
    Posts
    60
    Thanks
    20
    Thanked 3 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Hi seneca,

    > Qt can also be compiled with VisualStudio, that should give you the same speed.
    Yes this is the point, on windows I'm using VC++ with Qt4, on linux g++ and I got 15 min!

    Cheers,
    Louis

  7. #7
    Join Date
    Dec 2006
    Posts
    426
    Thanks
    8
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Maximum Speed Needed

    How do you compile the one that run 3 minutes, and how do you compile the other one that run 15 minutes?

    You should be able to compare the compilation and make adjustment. Make sure they use the same flags, same compiler, etc...

    I had same issues, but my program runs 3 days, so it is harder to compare because I need to wait for 3 days to get result...

  8. #8
    Join Date
    Mar 2009
    Posts
    25
    Thanked 2 Times in 2 Posts

    Default Re: Maximum Speed Needed

    Quote Originally Posted by lixo1 View Post
    Well now I restart my project using Qt4, I created a QThread, but unfortunatelly when I compile it in release mode, the needed time is > 15min, (on Linux and Windows).
    Is it implied here that if you compile with Qt debug library it runs faster and less than 15min?

    Quote Originally Posted by lixo1 View Post
    The alogrithm is the same (copy/paste).
    If you can copy and paste and it is convenient, I suggest you use some other 3rd party threading library with Linux to test that piece of code and see how long it takes.

    If only .NET can run that fast, we can do nothing about it. It is a Microsoft's Framework running on a Microsoft's OS :P


    Also, you can time the various parts of your code to see if a certain part takes very long and optimize it.

  9. #9
    Join Date
    Sep 2008
    Posts
    60
    Thanks
    20
    Thanked 3 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Hi everybody,

    > Is it implied here that if you compile with Qt debug library it runs faster and less than 15min?
    No, release mode is faster than debug mode.

    Well on windows I created a VC++ project using TEMPLATE = vcapp, qmake creates a VC++ project with all the same compiler options like my .Net application, eccept the /clr option (I will not link it with .net framework). The result is good, now it takes 5min instead of 15min. Great, thank you.

    The problem is on Linux (in my case Ubuntu 8.10) with g++, it takes 15min, is it possible that g++ cannot optimize it? In the past I worked on Windows using Dev-cpp (that uses g++, mingw) and unfortunately this same program without graphical interface took about 1h to finish.

    Thank you very much for your help!
    Cheers,
    Louis

  10. #10
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Thanks
    8
    Thanked 127 Times in 121 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Did you think about turning on SSE (or other SIMD instruction sets available on your PC)? The compiler flags below are know to dramatically improve performance of apps making intensive use of floating points computations, which is probably your case :
    QMAKE_CXX_FLAGS += -march=native -mfpmath=sse
    To figure out which extra instructions set are supported by your CPU you can examine the output of a "cat /proc/cpuinfo". If the "flags" fields contains at least "sse" your can safely use the above flags (if "sse2", "sse3", "ssse3" or "sse4" are present it is even better).

    Please note however that such binaries will not be portable.

    Also did you consider using QtConcurrent to distribute the computations accross multiple threads?
    Current Qt projects : QCodeEdit, RotiDeCode

  11. #11
    Join Date
    Sep 2008
    Posts
    60
    Thanks
    20
    Thanked 3 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Hi fullmetalcoder,

    Thank you for your help, but unfortunately I got another time 15min. (my cpu flags has sse and sse2).

    It's very strange!
    Now, about multiple thread: in my case the "thread" is very simple it's a simple for() like the code bellow:

    Qt Code:
    1. for (i=0;i<12;i++) {compt[i][0]=0;}
    2. for (ii=0;ii<=2;ii=ii+1)
    3. {
    4. for (i=0;i<12;i++) {compt[i][1]=0;}
    5. for (jj=0;jj<=2;jj=jj+1)
    6. {
    7. for (i=0;i<12;i++) {compt[i][2]=0;}
    8. for (kk=0;kk<=2;kk=kk+1)
    9. {
    10. for (i=0;i<12;i++) {compt[i][3]=0;}
    11. for (ll=0;ll<=2;ll=ll+1)
    12. {
    13. for (i=0;i<12;i++) {compt[i][4]=0;}
    14. for (mm=0;mm<=2;mm=mm+1)
    15. {
    16. for (i=0;i<12;i++) {compt[i][5]=0;}
    17. for (nn=0;nn<=2;nn=nn+1)
    18. {
    19. for (i=0;i<12;i++) {compt[i][6]=0;}
    20. for (oo=0;oo<=2;oo=oo+1)
    21. {
    22. for (i=0;i<12;i++) {compt[i][7]=0;}
    23. for (pp=0;pp<=2;pp=pp+1)
    24. {
    25.  
    26. for (i=0;i<12;i++) {compt[i][8]=0;}
    27. for (psi=0;psi<2*M_PI;psi=psi+step)
    28. {
    29. for (i=0;i<12;i++) {compt[i][9]=0;}
    30. for (phi=0;phi<2*M_PI;phi=phi+step)
    31. {
    32. for (i=0;i<12;i++) {compt[i][10]=0;}
    33. for (teta=0;teta<M_PI;teta=teta+step)
    34. { //*V 90
    35. compt[0][10]=compt[0][10] + sin(teta)*(0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    36. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    37. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    38. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]));
    39.  
    40. compt[1][10]=compt[1][10] + sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    41. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    42. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    43. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp])
    44. + 2 *sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    45. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    46. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    47. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    48.  
    49. compt[2][10]=compt[2][10] + sin(teta)*0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    50. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    51. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    52. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    53. //*H 90
    54. compt[3][10]=compt[3][10] + sin(teta)*(0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    55. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    56. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    57. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]));
    58. compt[4][10]=compt[4][10] + sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    59. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    60. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    61. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp])
    62. + 0.5*sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    63. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    64. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    65. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    66. compt[5][10]=compt[5][10] + sin(teta)*0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    67. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    68. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    69. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    70. //Cancel process if stop requested
    71. if(isTerminated){goto end;}
    72.  
    73. }
    74. for (i=0;i<12;i++) {compt[i][9]=compt[i][9]+compt[i][10];}
    75. }
    76. for (i=0;i<12;i++) {compt[i][8]=compt[i][8]+compt[i][9];}
    77. }
    78. for (i=0;i<12;i++) {compt[i][7]=compt[i][7]+compt[i][8];}
    79. }
    80. for (i=0;i<12;i++) {compt[i][6]=compt[i][6]+compt[i][7];}
    81. }
    82. for (i=0;i<12;i++) {compt[i][5]=compt[i][5]+compt[i][6];}
    83. }
    84. for (i=0;i<12;i++) {compt[i][4]=compt[i][4]+compt[i][5];}
    85. }
    86. for (i=0;i<12;i++) {compt[i][3]=compt[i][3]+compt[i][4];}
    87. }
    88.  
    89. //Increment ProgressBar + set number in %
    90. if(DipCoefChecked){ percent++; emit progbar(percent);}
    91. else{ percent +=2; emit progbar(percent); }
    92.  
    93. for (i=0;i<12;i++) {compt[i][2]=compt[i][2]+compt[i][3];}
    94. }
    95. for (i=0;i<12;i++) {compt[i][1]=compt[i][1]+compt[i][2];}
    96. }
    97. for (i=0;i<12;i++) {compt[i][0]=compt[i][0]+compt[i][1];}
    98. }
    To copy to clipboard, switch view to plain text mode 
    Last edited by lixo1; 20th March 2009 at 18:19. Reason: spelling error

  12. #12
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Thanks
    8
    Thanked 127 Times in 121 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    So you used added the flags to your project and it did not change anything right? Well, did you

    1. rebuild your whole app (changing compiler flags is not taken into account by make deps check AFAIK)
    2. check that these flags were passed to the compiler (quick look at the compile log would be enough), there could be something as simple as a spelling error in the variable name or something akin

    Maybe I'm a bit touchy but your code is among the worst formatted I have ever seen and this has a dramatic effect on readability...

    Another important fact is that you use HUGE assignements in the inner loop which are probably not very well optimized by GCC because it cannot make any assumptions on the values returned by sin() and eulerb() (specifically it cannot guess that several calls to these with the exact same parameter will return the same result). Besides most, if not all your assignements could be changed to use the += operator (for readability and to make sure the compiler optimize as much as possible). You could also compute step*step*step only once and out of the loop (making it const could help optimizations as well).

    suggested refactoring (which I am not entierely doing for you ) :
    Qt Code:
    1. for (i=0;i<12;i++)
    2. {compt[i][0]=0;}
    3.  
    4. const stepCube = step * step * step;
    5.  
    6. for (ii=0;ii<=2;ii=ii+1)
    7. {
    8. for (i=0;i<12;i++)
    9. {compt[i][1]=0;}
    10.  
    11. for (jj=0;jj<=2;jj=jj+1)
    12. {
    13. for (i=0;i<12;i++)
    14. {compt[i][2]=0;}
    15.  
    16. for (kk=0;kk<=2;kk=kk+1)
    17. {
    18. for (i=0;i<12;i++)
    19. {compt[i][3]=0;}
    20.  
    21. for (ll=0;ll<=2;ll=ll+1)
    22. {
    23. for (i=0;i<12;i++)
    24. {compt[i][4]=0;}
    25.  
    26. const double gammaIJKL = gamma_a[ii][jj][kk][ll];
    27. for (mm=0;mm<=2;mm=mm+1)
    28. {
    29. for (i=0;i<12;i++)
    30. {compt[i][5]=0;}
    31.  
    32. for (nn=0;nn<=2;nn=nn+1)
    33. {
    34. for (i=0;i<12;i++)
    35. {compt[i][6]=0;}
    36.  
    37. for (oo=0;oo<=2;oo=oo+1)
    38. {
    39. for (i=0;i<12;i++)
    40. {compt[i][7]=0;}
    41.  
    42. for (pp=0;pp<=2;pp=pp+1)
    43. {
    44. for (i=0;i<12;i++)
    45. {compt[i][8]=0;}
    46.  
    47. const double gammaMNOP = gamma_a[mm][nn][oo][pp];
    48. for (psi=0;psi<2*M_PI;psi=psi+step)
    49. {
    50. for (i=0;i<12;i++)
    51. {compt[i][9]=0;}
    52.  
    53. for (phi=0;phi<2*M_PI;phi=phi+step)
    54. {
    55. for (i=0;i<12;i++)
    56. {compt[i][10]=0;}
    57.  
    58. for (teta=0;teta<M_PI;teta=teta+step)
    59. { //*V 90
    60. const double sinTheta = sin(teta);
    61. const double stsc = sinTheta * stepCube;
    62. const double euler1i = eulerb(1,ii,teta,phi,psi);
    63. const double euler2i = eulerb(2,ii,teta,phi,psi);
    64. const double euler1i2i = euler1i + euler2i;
    65.  
    66. const double euler0j = eulerb(0,jj,teta,phi,psi);
    67. const double euler0k = eulerb(0,kk,teta,phi,psi);
    68. const double euler0l = eulerb(0,ll,teta,phi,psi);
    69.  
    70. const double euler1j = eulerb(1,jj,teta,phi,psi);
    71. const double euler1k = eulerb(1,kk,teta,phi,psi);
    72. const double euler1l = eulerb(1,ll,teta,phi,psi);
    73.  
    74. const double euler2j = eulerb(2,jj,teta,phi,psi);
    75. const double euler2k = eulerb(2,kk,teta,phi,psi);
    76. const double euler2l = eulerb(2,ll,teta,phi,psi);
    77.  
    78. compt[0][10] += stsc * 0.5 *
    79. euler1i2i*euler0j*euler0k*euler0l*gammaIJKL
    80. *
    81. euler1m2m*euler0n*euler0o*euler0p*gammaMNOP;
    82. ...
    To copy to clipboard, switch view to plain text mode 
    Of course since the step is fixed during the whole loop you could (and probably should) precompute most of the values (sinTheta, eulerXXX...) and put them in vector/hash (for fast lookup) to reuse them in the inner loop instead of recomputing them every time.

    With all these optimizations I'm willing to bet you can go below one minute, despite your algorithm being rather suboptimal.
    Current Qt projects : QCodeEdit, RotiDeCode

  13. The following user says thank you to fullmetalcoder for this useful post:

    lixo1 (20th March 2009)

  14. #13
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    I would do such computations in assembly if I were you... If you have a multicore or multicpu machine available try parallelizing the computation. Furthermore replace the for loops with something faster. The first loop can be replaced by memset() for example. Also optimize your code to make proper use of the data cache in your processor. Cache partial results - there is no need to recompute sin(teta) and all the other stuff all the time.

    And lose the goto
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  15. The following user says thank you to wysota for this useful post:

    lixo1 (20th March 2009)

  16. #14
    Join Date
    Sep 2008
    Posts
    60
    Thanks
    20
    Thanked 3 Times in 2 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Hi fullmetalcoder,

    Absolutely thank you very much for all patience, help and example, I will modify all code.
    I'm an amateur (in particular a physicist) so sorry very much for my chaotic readability.

    wysota > And lose the goto
    What you recommend to use?
    I don't use "return;" because after the loop I delete my arrays.

    Cheers,
    Louis

  17. #15
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    Quote Originally Posted by lixo1 View Post
    wysota > And lose the goto
    What you recommend to use?
    I don't use "return;" because after the loop I delete my arrays.
    There are numerous ways to do it. The easiest one is to allocate all the data on the stack and not on the heap (the latter using the "new" operator). Another is to delete the data using an external function call and then call return. You can also use the "break" keyword to bail out of a loop.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  18. #16
    Join Date
    Dec 2006
    Posts
    426
    Thanks
    8
    Thanked 18 Times in 17 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Wink Re: Maximum Speed Needed

    You are computing the same thing a billion times

    For instance, you put sin( teta ) in the inner most loop, repeatedly compute the same sin( teta ) in every single outer loop. You can make a sin( teta ) array by computing only once. sin( teta ) is very slow instruction.

    Also, you have step * step * step which is another constant, but it is computed 3 trillion times

    In the function calls, don't call a function with same arguments many times, such as eulerb( 2, ii, teta, phi, psi ). You can do:

    double foo = eulerb( 2, ii, teta, phi, psi );

    and then use this foo to replace the same function calls.

    Have fun

  19. #17
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    You can even use the "register" keyword to hint the compiler about most often used variables.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  20. #18
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Thanks
    8
    Thanked 127 Times in 121 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Quote Originally Posted by wysota View Post
    I would do such computations in assembly if I were you...
    You're going to scare him to death with such advices. Besides, well written C code can perform pretty well and is be a lot more maintainable and portable than assembly.

    Quote Originally Posted by wysota View Post
    You can even use the "register" keyword to hint the compiler about most often used variables.
    This would actually be a bad idea given the number of most used variables and the fact that modern compilers are typically smarter about guessing an optimal register allocation strategy than the user could ever be.
    Current Qt projects : QCodeEdit, RotiDeCode

  21. #19
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,368
    Thanks
    3
    Thanked 5,017 Times in 4,793 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    Quote Originally Posted by fullmetalcoder View Post
    Besides, well written C code can perform pretty well and is be a lot more maintainable and portable than assembly.
    I'm guessing he's not much about portability here and it's obviously some formula that is unlikely to change so the benefit of using assembly (or even NVIDIA's CUDA technology) is huge.

    This would actually be a bad idea given the number of most used variables and the fact that modern compilers are typically smarter about guessing an optimal register allocation strategy than the user could ever be.
    I would not agree here.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  22. #20
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Thanks
    8
    Thanked 127 Times in 121 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Maximum Speed Needed

    Quote Originally Posted by wysota View Post
    I would not agree here.
    Well, maybe some individuals are able to best the compiler regarding register allocation strategy but then it is very likely that they are able produce more optimized assembly code which renders the register keyword useless.

    AFAIK careful use of const and restrict keywords (with strict aliasing enabled for the later to be of any use) are pretty good hints to give to the compiler in such a situation and register is unlikely to make a big difference.

    Quote Originally Posted by wysota View Post
    I'm guessing he's not much about portability here and it's obviously some formula that is unlikely to change so the benefit of using assembly (or even NVIDIA's CUDA technology) is huge.
    True enough but given the difficulty encountered to write it in C, I doubt that he will be tempted to learn assembly. CUDA might be a viable alternative as it does not require use of assembly AFAIK but its biggest advantage is that it offers massive parallelization and the algorithm above looks like an iterative approximation to me so I'm not sure parallelizing it will be easy...
    Current Qt projects : QCodeEdit, RotiDeCode

Similar Threads

  1. Replies: 5
    Last Post: 21st March 2009, 09:10
  2. Axis's minimum and maximum
    By pankaj.patil in forum Qwt
    Replies: 4
    Last Post: 16th June 2008, 21:38

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Qt is a trademark of The Qt Company.