Results 1 to 20 of 36

Thread: Maximum Speed Needed

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Sep 2008
    Posts
    60
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    20
    Thanked 3 Times in 2 Posts

    Default Re: Maximum Speed Needed

    Hi fullmetalcoder,

    Thank you for your help, but unfortunately I got another time 15min. (my cpu flags has sse and sse2).

    It's very strange!
    Now, about multiple thread: in my case the "thread" is very simple it's a simple for() like the code bellow:

    Qt Code:
    1. for (i=0;i<12;i++) {compt[i][0]=0;}
    2. for (ii=0;ii<=2;ii=ii+1)
    3. {
    4. for (i=0;i<12;i++) {compt[i][1]=0;}
    5. for (jj=0;jj<=2;jj=jj+1)
    6. {
    7. for (i=0;i<12;i++) {compt[i][2]=0;}
    8. for (kk=0;kk<=2;kk=kk+1)
    9. {
    10. for (i=0;i<12;i++) {compt[i][3]=0;}
    11. for (ll=0;ll<=2;ll=ll+1)
    12. {
    13. for (i=0;i<12;i++) {compt[i][4]=0;}
    14. for (mm=0;mm<=2;mm=mm+1)
    15. {
    16. for (i=0;i<12;i++) {compt[i][5]=0;}
    17. for (nn=0;nn<=2;nn=nn+1)
    18. {
    19. for (i=0;i<12;i++) {compt[i][6]=0;}
    20. for (oo=0;oo<=2;oo=oo+1)
    21. {
    22. for (i=0;i<12;i++) {compt[i][7]=0;}
    23. for (pp=0;pp<=2;pp=pp+1)
    24. {
    25.  
    26. for (i=0;i<12;i++) {compt[i][8]=0;}
    27. for (psi=0;psi<2*M_PI;psi=psi+step)
    28. {
    29. for (i=0;i<12;i++) {compt[i][9]=0;}
    30. for (phi=0;phi<2*M_PI;phi=phi+step)
    31. {
    32. for (i=0;i<12;i++) {compt[i][10]=0;}
    33. for (teta=0;teta<M_PI;teta=teta+step)
    34. { //*V 90
    35. compt[0][10]=compt[0][10] + sin(teta)*(0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    36. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    37. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    38. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]));
    39.  
    40. compt[1][10]=compt[1][10] + sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    41. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    42. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    43. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp])
    44. + 2 *sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    45. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    46. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    47. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    48.  
    49. compt[2][10]=compt[2][10] + sin(teta)*0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    50. + eulerb(1,ii,teta,phi,psi)*eulerb(0,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    51. * (eulerb(2,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    52. + eulerb(1,mm,teta,phi,psi)*eulerb(0,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    53. //*H 90
    54. compt[3][10]=compt[3][10] + sin(teta)*(0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    55. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    56. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    57. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(0,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]));
    58. compt[4][10]=compt[4][10] + sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    59. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(0,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    60. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    61. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp])
    62. + 0.5*sin(teta)*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    63. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(0,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    64. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    65. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(0,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    66. compt[5][10]=compt[5][10] + sin(teta)*0.5*step*step*step*(eulerb(2,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll]
    67. + eulerb(1,ii,teta,phi,psi)*eulerb(2,jj,teta,phi,psi)*eulerb(1,kk,teta,phi,psi)*eulerb(1,ll,teta,phi,psi)*gamma_a[ii][jj][kk][ll])
    68. * (eulerb(2,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]
    69. + eulerb(1,mm,teta,phi,psi)*eulerb(2,nn,teta,phi,psi)*eulerb(1,oo,teta,phi,psi)*eulerb(1,pp,teta,phi,psi)*gamma_a[mm][nn][oo][pp]);
    70. //Cancel process if stop requested
    71. if(isTerminated){goto end;}
    72.  
    73. }
    74. for (i=0;i<12;i++) {compt[i][9]=compt[i][9]+compt[i][10];}
    75. }
    76. for (i=0;i<12;i++) {compt[i][8]=compt[i][8]+compt[i][9];}
    77. }
    78. for (i=0;i<12;i++) {compt[i][7]=compt[i][7]+compt[i][8];}
    79. }
    80. for (i=0;i<12;i++) {compt[i][6]=compt[i][6]+compt[i][7];}
    81. }
    82. for (i=0;i<12;i++) {compt[i][5]=compt[i][5]+compt[i][6];}
    83. }
    84. for (i=0;i<12;i++) {compt[i][4]=compt[i][4]+compt[i][5];}
    85. }
    86. for (i=0;i<12;i++) {compt[i][3]=compt[i][3]+compt[i][4];}
    87. }
    88.  
    89. //Increment ProgressBar + set number in %
    90. if(DipCoefChecked){ percent++; emit progbar(percent);}
    91. else{ percent +=2; emit progbar(percent); }
    92.  
    93. for (i=0;i<12;i++) {compt[i][2]=compt[i][2]+compt[i][3];}
    94. }
    95. for (i=0;i<12;i++) {compt[i][1]=compt[i][1]+compt[i][2];}
    96. }
    97. for (i=0;i<12;i++) {compt[i][0]=compt[i][0]+compt[i][1];}
    98. }
    To copy to clipboard, switch view to plain text mode 
    Last edited by lixo1; 20th March 2009 at 18:19. Reason: spelling error

  2. #2
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    8
    Thanked 127 Times in 121 Posts

    Default Re: Maximum Speed Needed

    So you used added the flags to your project and it did not change anything right? Well, did you

    1. rebuild your whole app (changing compiler flags is not taken into account by make deps check AFAIK)
    2. check that these flags were passed to the compiler (quick look at the compile log would be enough), there could be something as simple as a spelling error in the variable name or something akin

    Maybe I'm a bit touchy but your code is among the worst formatted I have ever seen and this has a dramatic effect on readability...

    Another important fact is that you use HUGE assignements in the inner loop which are probably not very well optimized by GCC because it cannot make any assumptions on the values returned by sin() and eulerb() (specifically it cannot guess that several calls to these with the exact same parameter will return the same result). Besides most, if not all your assignements could be changed to use the += operator (for readability and to make sure the compiler optimize as much as possible). You could also compute step*step*step only once and out of the loop (making it const could help optimizations as well).

    suggested refactoring (which I am not entierely doing for you ) :
    Qt Code:
    1. for (i=0;i<12;i++)
    2. {compt[i][0]=0;}
    3.  
    4. const stepCube = step * step * step;
    5.  
    6. for (ii=0;ii<=2;ii=ii+1)
    7. {
    8. for (i=0;i<12;i++)
    9. {compt[i][1]=0;}
    10.  
    11. for (jj=0;jj<=2;jj=jj+1)
    12. {
    13. for (i=0;i<12;i++)
    14. {compt[i][2]=0;}
    15.  
    16. for (kk=0;kk<=2;kk=kk+1)
    17. {
    18. for (i=0;i<12;i++)
    19. {compt[i][3]=0;}
    20.  
    21. for (ll=0;ll<=2;ll=ll+1)
    22. {
    23. for (i=0;i<12;i++)
    24. {compt[i][4]=0;}
    25.  
    26. const double gammaIJKL = gamma_a[ii][jj][kk][ll];
    27. for (mm=0;mm<=2;mm=mm+1)
    28. {
    29. for (i=0;i<12;i++)
    30. {compt[i][5]=0;}
    31.  
    32. for (nn=0;nn<=2;nn=nn+1)
    33. {
    34. for (i=0;i<12;i++)
    35. {compt[i][6]=0;}
    36.  
    37. for (oo=0;oo<=2;oo=oo+1)
    38. {
    39. for (i=0;i<12;i++)
    40. {compt[i][7]=0;}
    41.  
    42. for (pp=0;pp<=2;pp=pp+1)
    43. {
    44. for (i=0;i<12;i++)
    45. {compt[i][8]=0;}
    46.  
    47. const double gammaMNOP = gamma_a[mm][nn][oo][pp];
    48. for (psi=0;psi<2*M_PI;psi=psi+step)
    49. {
    50. for (i=0;i<12;i++)
    51. {compt[i][9]=0;}
    52.  
    53. for (phi=0;phi<2*M_PI;phi=phi+step)
    54. {
    55. for (i=0;i<12;i++)
    56. {compt[i][10]=0;}
    57.  
    58. for (teta=0;teta<M_PI;teta=teta+step)
    59. { //*V 90
    60. const double sinTheta = sin(teta);
    61. const double stsc = sinTheta * stepCube;
    62. const double euler1i = eulerb(1,ii,teta,phi,psi);
    63. const double euler2i = eulerb(2,ii,teta,phi,psi);
    64. const double euler1i2i = euler1i + euler2i;
    65.  
    66. const double euler0j = eulerb(0,jj,teta,phi,psi);
    67. const double euler0k = eulerb(0,kk,teta,phi,psi);
    68. const double euler0l = eulerb(0,ll,teta,phi,psi);
    69.  
    70. const double euler1j = eulerb(1,jj,teta,phi,psi);
    71. const double euler1k = eulerb(1,kk,teta,phi,psi);
    72. const double euler1l = eulerb(1,ll,teta,phi,psi);
    73.  
    74. const double euler2j = eulerb(2,jj,teta,phi,psi);
    75. const double euler2k = eulerb(2,kk,teta,phi,psi);
    76. const double euler2l = eulerb(2,ll,teta,phi,psi);
    77.  
    78. compt[0][10] += stsc * 0.5 *
    79. euler1i2i*euler0j*euler0k*euler0l*gammaIJKL
    80. *
    81. euler1m2m*euler0n*euler0o*euler0p*gammaMNOP;
    82. ...
    To copy to clipboard, switch view to plain text mode 
    Of course since the step is fixed during the whole loop you could (and probably should) precompute most of the values (sinTheta, eulerXXX...) and put them in vector/hash (for fast lookup) to reuse them in the inner loop instead of recomputing them every time.

    With all these optimizations I'm willing to bet you can go below one minute, despite your algorithm being rather suboptimal.
    Current Qt projects : QCodeEdit, RotiDeCode

  3. The following user says thank you to fullmetalcoder for this useful post:

    lixo1 (20th March 2009)

  4. #3
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,373
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Thanks
    3
    Thanked 5,019 Times in 4,795 Posts
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    I would do such computations in assembly if I were you... If you have a multicore or multicpu machine available try parallelizing the computation. Furthermore replace the for loops with something faster. The first loop can be replaced by memset() for example. Also optimize your code to make proper use of the data cache in your processor. Cache partial results - there is no need to recompute sin(teta) and all the other stuff all the time.

    And lose the goto
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  5. The following user says thank you to wysota for this useful post:

    lixo1 (20th March 2009)

  6. #4
    Join Date
    Sep 2008
    Posts
    60
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    20
    Thanked 3 Times in 2 Posts

    Default Re: Maximum Speed Needed

    Hi fullmetalcoder,

    Absolutely thank you very much for all patience, help and example, I will modify all code.
    I'm an amateur (in particular a physicist) so sorry very much for my chaotic readability.

    wysota > And lose the goto
    What you recommend to use?
    I don't use "return;" because after the loop I delete my arrays.

    Cheers,
    Louis

  7. #5
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,373
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Thanks
    3
    Thanked 5,019 Times in 4,795 Posts
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    Quote Originally Posted by lixo1 View Post
    wysota > And lose the goto
    What you recommend to use?
    I don't use "return;" because after the loop I delete my arrays.
    There are numerous ways to do it. The easiest one is to allocate all the data on the stack and not on the heap (the latter using the "new" operator). Another is to delete the data using an external function call and then call return. You can also use the "break" keyword to bail out of a loop.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  8. #6
    Join Date
    Dec 2006
    Posts
    426
    Qt products
    Qt4
    Platforms
    Unix/X11
    Thanks
    8
    Thanked 18 Times in 17 Posts

    Wink Re: Maximum Speed Needed

    You are computing the same thing a billion times

    For instance, you put sin( teta ) in the inner most loop, repeatedly compute the same sin( teta ) in every single outer loop. You can make a sin( teta ) array by computing only once. sin( teta ) is very slow instruction.

    Also, you have step * step * step which is another constant, but it is computed 3 trillion times

    In the function calls, don't call a function with same arguments many times, such as eulerb( 2, ii, teta, phi, psi ). You can do:

    double foo = eulerb( 2, ii, teta, phi, psi );

    and then use this foo to replace the same function calls.

    Have fun

  9. #7
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,373
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Thanks
    3
    Thanked 5,019 Times in 4,795 Posts
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    You can even use the "register" keyword to hint the compiler about most often used variables.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  10. #8
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    8
    Thanked 127 Times in 121 Posts

    Default Re: Maximum Speed Needed

    Quote Originally Posted by wysota View Post
    I would do such computations in assembly if I were you...
    You're going to scare him to death with such advices. Besides, well written C code can perform pretty well and is be a lot more maintainable and portable than assembly.

    Quote Originally Posted by wysota View Post
    You can even use the "register" keyword to hint the compiler about most often used variables.
    This would actually be a bad idea given the number of most used variables and the fact that modern compilers are typically smarter about guessing an optimal register allocation strategy than the user could ever be.
    Current Qt projects : QCodeEdit, RotiDeCode

  11. #9
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,373
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Thanks
    3
    Thanked 5,019 Times in 4,795 Posts
    Wiki edits
    10

    Default Re: Maximum Speed Needed

    Quote Originally Posted by fullmetalcoder View Post
    Besides, well written C code can perform pretty well and is be a lot more maintainable and portable than assembly.
    I'm guessing he's not much about portability here and it's obviously some formula that is unlikely to change so the benefit of using assembly (or even NVIDIA's CUDA technology) is huge.

    This would actually be a bad idea given the number of most used variables and the fact that modern compilers are typically smarter about guessing an optimal register allocation strategy than the user could ever be.
    I would not agree here.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  12. #10
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    8
    Thanked 127 Times in 121 Posts

    Default Re: Maximum Speed Needed

    Quote Originally Posted by wysota View Post
    I would not agree here.
    Well, maybe some individuals are able to best the compiler regarding register allocation strategy but then it is very likely that they are able produce more optimized assembly code which renders the register keyword useless.

    AFAIK careful use of const and restrict keywords (with strict aliasing enabled for the later to be of any use) are pretty good hints to give to the compiler in such a situation and register is unlikely to make a big difference.

    Quote Originally Posted by wysota View Post
    I'm guessing he's not much about portability here and it's obviously some formula that is unlikely to change so the benefit of using assembly (or even NVIDIA's CUDA technology) is huge.
    True enough but given the difficulty encountered to write it in C, I doubt that he will be tempted to learn assembly. CUDA might be a viable alternative as it does not require use of assembly AFAIK but its biggest advantage is that it offers massive parallelization and the algorithm above looks like an iterative approximation to me so I'm not sure parallelizing it will be easy...
    Current Qt projects : QCodeEdit, RotiDeCode

  13. #11
    Join Date
    Sep 2008
    Posts
    60
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    20
    Thanked 3 Times in 2 Posts

    Default Re: Maximum Speed Needed

    Hi guys,

    This post is a well of culture!
    Great thank you for all these high level informations, I would like only to ask you if it is possible, to link some web page tutorial (or simple book titles) about the different strategies, because it's very difficult to find informations about these technologies if we don't work with it.

    Only a thing: my application need to be redistributable on any windows/linux/mac.

    Thank you very much, for this great post.
    Cheers,
    Louis

  14. #12
    Join Date
    Jan 2006
    Location
    travelling
    Posts
    1,116
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    8
    Thanked 127 Times in 121 Posts

    Default Re: Maximum Speed Needed

    NVidia CUDA : http://www.nvidia.com/object/cuda_home.html

    Some blog posts on restrict keyword and strict aliasing.

    I can't seem to find a link to that blog discussing the use of the register keyword so you'll have to search yourself.
    Current Qt projects : QCodeEdit, RotiDeCode

  15. #13
    Join Date
    Sep 2008
    Posts
    60
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    20
    Thanked 3 Times in 2 Posts

    Default Re: Maximum Speed Needed

    Great thank you very much.

  16. #14
    Join Date
    Jan 2008
    Posts
    1
    Qt products
    Qt4
    Platforms
    MacOS X

    Default Re: Maximum Speed Needed

    Here is my suggestion in order, hope help.
    1. Using better algorithm is the first choice.Such as using look up table to make a balance between time and space.
    2. Parallel :
    (1) parallel in threads : I think OpenMP is a much easier solution instead of other multithread techniques. However it is not supported in express version of Visual C++. It's a set of compiler directives which make your computation parallel automatically.
    (2) parallel in instruction : It's very hard for compiler to find the parallelism in your compution even if the SIMD flag has been turn on. So the assembly code from these compiler just use these SIMD instructions to accomplish sequence tasks mostly. You need write the main part of of your computation in assemly language or some platform-independent wraper all by yourself. It speeds your computation more than multi-thread technique.
    (3) CUDA/OpenCL or some thing call heterogeneous systems parallel computation maybe. It's really fast. It's hard for me to learn with little graphics computation architecture knowledge and you need a Nvidia card in your computer, it's make your software hard to deploy freely.
    3 embedded computation : ASIC/FPGA or DSP processors. OK, I'm going far now ...
    Last edited by songyuncen; 22nd March 2009 at 13:28. Reason: spelling error

  17. #15
    Join Date
    Sep 2008
    Posts
    60
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    20
    Thanked 3 Times in 2 Posts

    Default Re: Maximum Speed Needed

    Hi wysota,

    Thank you very much for the example, now I understand what I have to do.
    To start I will implement a QtConcurrent, to verify the needed time.
    Thank you another time,
    Cheers,
    Louis

Similar Threads

  1. Replies: 5
    Last Post: 21st March 2009, 09:10
  2. Axis's minimum and maximum
    By pankaj.patil in forum Qwt
    Replies: 4
    Last Post: 16th June 2008, 21:38

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Qt is a trademark of The Qt Company.