I would assume most CPUs have a SIMD instruction (e.g. PADDW for MMX) which allows to add two arrays together efficiently. From a quick google search it seems that if you write code properly and enable mmx support in the compilation, the compiler should be able to optimize your loops using SIMD instructions.
You can start from here: http://stackoverflow.com/questions/5...e-instructions
Bookmarks