I needed to implement a parallel matrix multiplication in some different programming languages, frameworks and platforms to compare performance.

My first implementation was in pure C using pthread.

After, I implemented it using Qt, with Qthreads. I believed that I'd get lower performance with Qt, cause I read that Qthread is implemented with pthreads.

For my surprise the Qt program was 2x to 3x times faster than the C application. Does anyone have some ideas why this happens?