I got tired of the debatable quality of sound provided by the Qsound interface under Win32, so I decided to have a go at an alternative implementation.

here are the sources : Win32Sound.zip

The current (4.8.1) Win32 QSound is relying on the PlaySound interface, which I found a bit too simple for comfort.
Basically, any new sound will cancel all previous ones. This ear-wise unpleasant behaviour is documented, but besides, when playing a lot of sounds in short succession, the system eventually goes mute until accumulated erroneous play requests are flushed out (and that can last a good long while!).

Sure you can use Phonon for more sophisticated sound effects, but
1) it's a hassle to setup, and rather an overkill to play a handful of less-than-a-second-long beeps and clicks
2) it's not available on Win32/MinGW (which happens to be the environment I use )

Anyway, I dug out the antiquated waveOut set of functions (with the quirky handles, callbacks and all) to produce an alternative Win32Sound class, with an QSound superset interface.

Improvements over QSound
1) different sounds can be played concurrently
Calling play() on the same sound will still (by design) interrupt the previous instance, but instances of other sounds will continue playing.

2) no more system overload
You can play any sound as often as you want without overloading the system into muteness. Creating too many sounds will eventually drain the available system handles dry, but as soon as a Win32Sound object has been successfully constructed, it's guaranteed (bugs notwhitstanding) to be available and remain so until deletion.

3) snappy stop() response
Sound stopping is immediate in all cases (looping or not).

4) accurate loop count
Well I'm not sure what loopsRemaining() is supposed to return, but internal loop count is as accurate as the sound you hear, so maybe a little tweak of the accessor may be necessary.

Extraneous functionalities (using them might hurt portability)
1) resource files can be read directly. That allows to skip the old dump-my-.wav-resources-to-disk chore.
2) sample duration is available (in milliseconds). I needed that for the test harness.

Requirements
Must be linked with the MinGW winmm library (or any wrapper to the native Win32 winmm.dll)

Limitations
1) granularity
Samples are entirely loaded into memory during instantiation, and played as whole chunks.

2) sample size
No theoretical limit, although using megabytes of ram to store raw PCM data may end up hurting a few kittens.

3) output device
Ouput device is the default system WAVE mapper. This could be made configurable, but that would hurt QSound compatibility even more for a result never exceeding a beggar's Phonon.

4) file format
Only the basic PCM encoding (or lack of !) from RIFF/WAVE files is supported.
Motorola fans should have little trouble adding RIFX support: just a few endians to swap.
On the other hand, current QSound implementation does little or no better, and if you want to handle big sounds or smart compression it's bye-bye MinGW, hello Phonon anyway.

5) number of concurrent sounds
My tests on Windows 7 with a basic RealTek AC'97 audio chip and small samples (8000 Hz 8 bits mono, durations from 200 ms to 7s) showed troubles around 300 or 400 simultaneously alocated sounds. Occasional driver out of memory errors also occured at this point.
My guess is that both handles and sample buffers share a common driver memory pool that will eventually be exhausted either by creating lots of sounds or playing big ones at the same time.

In case of handle penury, the sound will never play; it must be deleted and created anew after some resources have been freed. In case of lack of memory, the current play() invokation will fail but the sound will remain available for retry.
These limits seem to be system-wide. Not surprising since the complaints originate from somewhere near the audio chip driver itself.

I reckon this test case shows a capacity more than sufficient for usual QSound uses.

6) performances
Should be faster than a six-legged jackrabbit on any system strong enough to run Windows and live to tell the tale. The overhead is less than a single signal/slot activation, and should be lost in the background noise (if I dare say so).
I paid no special attention to performances, but I don't reckon there's too much useless fat in the code either, considering the limitations of this venerable Win32 interface.

Possible improvements
1) More efficient driver memory use
One could split samples into smaller chunks to spare driver resources. It would make the code a lot more complex for the benefit of playing all the songs of your giant hits discotheque album in one go. Hardly worth the effort, if you ask me.

2) 64 bits support
Dream on! waveOut dates back from the times when DWORDs and pointers frolicked happily together in Bill's garden of Eden. Ah, those were the days...

3) other platforms
I don't have the means to test this on Linux right now. But maybe the local QSound variants is already good enough in the first place? Anyway, if someone feels like it, I'd be glad to see what a port looks like.

Tests
The included test program focuses on concurrent playing and leaves ancillary functions like loopsRemaining() aside.
stop() is not tested explicitely since it shares most of its code with start() and the destructor.
For the ones who care to know, the disco tune comes from Ivan Kupala - Kostroma. I found it convenient to test looping .

Caveat
I expect the worst that could occur is a process crash if the driver behind waveOut interface manages to slip through the back door and goes on a rampage over invalid sample memory. This has been tested, but who knows? Gremlins are always on the lurk.
Just in case, I decline all responsibility for computer, pets, ears or brain damage caused directly or indirectly by this piece of software.

Comments are very welcome. Hope some readers will find this post mildly amusing or even helpful.