I just wanted to say that the problem was finally solved, but turned out to be a lot more complex than I had expected.

Without going into too much detail, it turns out that there is a problem with aligning the 16 byte sse2 variables inside a c++ class with a microsoft compiler.

So jacek, I am guessing that you are not using a microsoft-compiler since you did not get any errors with the little test-program I posted. I am even guessing you are not using anything microsoft (like Visual Studio) at all...?

The solution was to overload the "new" operator to make it align them properly.