Could you post some code?
I am sure this can be solved by optimizing the code rather than testing gcc flags.

Regards