In my opinion this is not the way to go. You double the work this way - first
QTextDocument generates its html representation and then the script simplifies it into another html. It's better to generate a clean html in the first place and its easily achievable - all that it takes is a bit of brainstorming how to avoid pitfalls that may arise and then apply those developed rules during document traversal.
I'm thinking of developing an output generator for the MediaWiki syntax. Currently I don't have time to do it, but maybe in the near future I will.