PDA

View Full Version : QAudioProbe to FFT - by QAudioBuffer



pajczur
7th May 2017, 23:12
Hello,
I'm working on my school project, and my task is to get musical tone value (for example for 440Hz sine wave it's tone "A") in real time from audio recorder.

What I remember from math lessons, the fast fourier transformation would be great to solve it. But implementing it in C++ it's not easy for me. I've spent today some time to implement FFTW project to my code, and I think I am on good way, but now I can't figure out how to send audio signal to my fft_function.

I tried something like that:



void MainWindow::fftSlot(QAudioBuffer buffer) // it's a public slot
{
int n = 44100;
fftw_complex x[n]; // input
fftw_complex y[n]; // output

const double *data = buffer.data<double>();

for(int i=0; i<=n; i++)
{
x[i][REAL] = *data;
x[i][IMAG] = 0;
}

fftw_plan myPlan = fftw_plan_dft_1d(n, x, y, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(myPlan);
}


But ot doesn't work for my. When I start:
QAudioProbe *probka;
probka = new QAudioProbe;
connect(probka, SIGNAL(audioBufferProbed(QAudioBuffer)), this, SLOT(fftSlot(QAudioBuffer)));

The program crashes with info: "The program has unexpectedly finished."

I'm pretty sure it's because that strange loop in slot. But I have no idea how to impute QAudioBuffer to the fftw_complex x[n], and what should be in that case int n=???;

Maybe anybody could help?
For any help thanks in advance.
Best Regards

d_stranz
8th May 2017, 01:04
For one thing, "n" is not the sampling frequency, it should be the number of samples in the audio buffer. For another, the audio buffer can have one of several different formats (which you get by asking the buffer for its QAudioFormat). You then need to look at the properties of the audio format to determine how many bytes there are per frame, where a "frame" contains one sample from each channel. The channel count tells you how many channels. Sample type tells you what number format is used for the samples.

When you finally have all the information you need to decode the audio buffer, you can decode it as an array of QAudioBuffer::StereoFrame frames. From the docs:



// Assuming 'buffer' is an unsigned 16 bit stereo buffer..
QAudioBuffer::S16U *frames = buffer->data<QAudioBuffer::S16U>();
for (int i=0; i < buffer->frameCount(); i++)
{
// frames[i].right contains the i-th sample from the right channel
// frames[i].left contains the i-th sample from the left channel
// if the signal is mono and not stereo, then only one of the channels will have data
}


At this point, I think you are in over your head. You should take a step back and study the audio examples from the Qt Multimedia Examples (https://doc.qt.io/qt-5/multimedia-examples.html) to see how these various classes are used, and then figure out how to extract the data into your FFT project. The Spectrum Example (https://doc.qt.io/qt-5/qtmultimedia-multimedia-spectrum-example.html) might give you some really good hints about how to do this.

pajczur
8th May 2017, 23:00
Hello,
great, great, great thanks for your so detailed answer. But please don’t finish with me yet :)

Ofcourse I tried to study Spectrum example from QT creator, even before you adviced that. But actually I am quite new in programming, and also I am 37 years old and my knowledge of FFT function and math at all is also very old. And with those two defects at once it’s almost impossible for me to go through all that code with understanding. And for example, in that example there are two *.pro files, and I’ve never seen before such complicated solution. I hope you forgive me.

But because of your answer I understand much more. But as you probably know - the more you understand, the more things you don’t understand.


I think it would be much clearer in points, so:
1. You mention type

QAudioBuffer::S16U
it’s stereo data, and for mono signal there will be data only in one channel - I suppose

frames[i].right
But I use monofonic microphone (built in laptop), and when I execute that code:


QAudioBuffer::S16U *frames = buffer->data<QAudioBuffer::S16U>();

for (int i=0; i < buffer->frameCount(); i++)
{
cout << frames[i].right << „ right” << endl;
cout << frames[i].left << „ left” << endl;
}

I get various values, sometimes the same, but most often different, and for sure neither righ and left have permanent zeros, there are zeros, but also there are a lot of some data.
I don’t understand why? And in the other hand, can’t I just declare in some way mono data? Something like that QAudioBuffer::??? *monoFrames; ?

2. I am also amateur musician and big fan of audio recording. And until now I thought I understand basics of computer audio technology process. But now I have impression I missed something.
As I know sample rate means „How many audio impuls are in one second of audio recording”, and bit rate means „with what accuracy each impuls is describe”.
But I am not sure what means „buffer”. You told „n” in my code should be not sampling frequency (I understand you mean here sammple rate) but it should be number of samples in the buffer, so:

buffer->frameCount();
But isn’t it the same? I know it isn’t, but why?

My audio format properties is 44100Hz and 16 bit. But when I

cout << buffer->frameCount();
I found it’s 2048. What does it mean? Where is any connection between those values?
And, which is makes it much more complicated to understand

cout << buffer->frameCount();
occasionally (but very rarely) gives 4096. What does it mean?


3. And last but not least. It’s not concern exactly to your answer, but it’s just next part of my primary issue, and FFT understanding at all. When I use that code:

fftw_plan_dft_1d(n, x, y, FFTW_FORWARD, FFTW_ESTIMATE);
As I understand:
x - is input data - so what exactly is it?
In my case it’s array of

QAudioBuffer::S16U
which are array of unsigned shorts. But what exactly each one of those unsigned shorts represent?
Each unsigned short is 2 bytes, so it 16 bits.
So I suppose one unsigned short is value (amplitude) of one audio impuls (one from those 44100 per second). Am I right?
But where is any value represent time. As I understand - in audio world - FFT input should be some wave ( amplitude in function of time) but not only amplitude values. Those amplitude values make sense only if we locate them in timeline.

So:
I suppose „n” is responsible in some way for time value. But in what way?
„n” is number of frame in buffer. But now we back to the first point: I can’t find relationship between frameCount() and sample rate. I understand sample rate because it’s some value in time, but frame count is some value in „buffer”. What is buffer?
Ok, I go to google and take my lesson :) I hope you are not ungry that I’ve written so much to get by myself „go to google” conclusion :) But when I write it’s easier to think slowly and in detailed. And maybe that thread will help somebody else also.

But to the point. There is still one more question about „y”.
It’s output of FFT.
But again: it’s only one value. Shouldn’t it be two values - some peaks in the function of frequency? What exactly „y” reprsent? How to read from it value of represent frequency?

For any help thanks in advance so much.

d_stranz
9th May 2017, 00:06
You told „n” in my code should be not sampling frequency (I understand you mean here sample rate)

"rate" and "frequency" mean essentially the same thing in this instance - a set of values (one for each channel) is taken from the audio stream 44100 times per second. Each time your stream is sampled, that's one "frame". If it is stereo, then each frame has two values, one for the left, one for the right channel.

The "frameCount()" is the total number of frames in the audio clip. If that is 2048 and you are recording stereo, then there are 4096 numbers (and there is only 46 ms of sound in the clip).


QAudioBuffer::S16U *frames = buffer->data<QAudioBuffer::S16U>();


This will give you valid frames only if the frame format is S16U. If your data is monaural but you are getting non-random values in both channels, then it implies that maybe S16U is not the correct format, or maybe your audio stream -is- stereo despite what you assume. What does QAudioFormat::bytesPerFrame() tell you? If you run the Spectrum Analyzer project (open it from the Examples page in Qt Creator) and load your audio file, the program will tell you the format. Match that in your own code.


can’t I just declare in some way mono data?

The properties of your data are determined by how your PC records the data to the file (or streams it from the microphone). Unless you have a way to configure the audio properties of the microphone (maybe from the Control Panel if you are running Windows), you have to process the data the way it is given to you.


But where is any value represent time.

The time is implied by the frame number. If you are sampling at 44100 Hz, then frame 0 = 0 ms, frame 1 = 1 / 44100 = 0.0226 ms, frame 2 = 0.0454 ms, etc. The frames contain only amplitude data.

The FFT also uses -only- the amplitude of the signal and requires that the signal be uniformly sampled (as your audio data is). So data point 0 of the FFT input is amplitude 0 from the audio, etc. The "time" used by the FFT is the data point number. If your data was not uniformly sampled (that is, the sampling rate was not constant at 44100 Hz), then you would have to convert the data to a uniform sampling (by interpolation and/or smoothing) to get a constant time difference between points.


I can’t find relationship between frameCount() and sample rate.

The buffer contains "frameCount()" frames. Each "frame" is one sample from the audio stream. If the sample rate is 44100 Hz, then the time at which frame 1 was sampled is 1 / 44100 seconds (0.0226 ms) later than frame 0, and so forth. That is the time line. There is no direct conversion between frameCount and total time. You need the sample rate as part of the equation, and that is a property of how the recording was made.



total time = frame count * sample rate



also I am 37 years old

and I have been programming for at least 10 years longer than you have been alive... :D

pajczur
9th May 2017, 23:25
Hello again,
and again great thanks for your reply. I understand much more about audio technology, buffers, frame counts etc.

But I still fight with FFT. I tried to go through some expert lectures, but I give up (at least today). It’s really impossible to understand that in one day when you don’t remember a lot of mathematic staffs, like integrals, imaginary number, Euler number theory, and some basics math symbols. I should start learning some years before that material.

So maybe still I could ask you for more help?

On that web http://hades.mech.northwestern.edu/index.php/PIC32MX:_FFT_of_Analog_Input
I found that example:
12458 // I don't know how to make it bigger, probably I should remind some html, maybe next time? Forgive me my laziness.

And my issues in points:
1) You told FFT uses only amplitude values, and don’t need time values until signal is uniformly sampled. But actually i think like that: FFT is just math function, and she (function) doesn’t know anything about physical values, like time. On the first graph I attached, there are on horizontal axis numbers from 0 to 300, which I believe represent 0.3 of miliseconds (to the value 250 I count 12.5 cycles, so 500 cycles will be for 10000, and this is 500Hz, so 1 second is represent by 10000, not by 1000 like you can think in first impression). But how FFT knows that 1 second is represent by 10000. Those number could be in different scale, so for example 1 second could be represent by value 250. So then frequency would be 12.5Hz but not 500Hz. How FFT knows it's 500Hz? Because of that I stil think I need in some way provide to input of FFT value of time. So where I am wrong?

2) Seccond issue is probably related.
1. I tried to imitate similar sinus function like that from attached screenshot:



int n = 1000;
fftw_complex in[n]; // input
fftw_complex out[n]; // output

for(int i=0; i<n; i++)
{
in[i][REAL] = 500+400*sin((2*M_PI*i/20)/*-2.4*/);
in[i][IMAG] = 0; // I don’t know why zero, I found in some tutorial.

// I also tried that but it gives me some strange results.
// if( i%2 == 0 ) n[i][REAL] = 500+400*sin((2*M_PI*i/20)/*-2.4*/);
// if( !(i%2 == 0 )) n[i][IMAG] = 500+400*sin((2*M_PI*i/20)/*-2.4*/);

}

fftw_plan myPlan = fftw_plan_dft_1d(n, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(myPlan);


When I plot „in” values in the function of „i” I get very similar graph to the first one on enclosed picture. That is great - I understand sinus function he he he.

But which values I should plot to receive the second graph? I tried that:


QVector<double> qv_x, qv_y;

for(int i=0; i<n; i++)
{
qv_x.append(i/1000); // I don’t know why, maybe because I thought one frame per 1000 ms? I tried various configurations with no results.
qv_y.append((out[i][REAL]));

ui->plot->graph(0)->setData(qv_x, qv_y);
ui->plot->replot();
ui->plot->update();
}


But I have some strange results. I tried various configuration, changed axis, multiply, change to out[i][IMAG]. But no results. Could you give some advice?

3. And last but not least. Great you have such experience. I am anvy :) And as I can see you live in California. I've never been there, but my wife and I will go on the vacation to California in September. We're going to go so far, because we believe it's dream place. Maybe you in California need such begginers like me to work, I would stay there with pleasure :) Of course I say it only in half seriously, but who knows...

pajczur
10th May 2017, 09:14
​
HA HA!!! EUREKA!!!
I was thinking and thinking, and I found some idea. Unfortunately I am in job now so I can't check my idea, here we have not any compiler, and especially not with FFTW installed. But to the point.
I think that in my case for FFT one second is represent by n-range.
So on the horizontal axis should be "n" devide by "i" which is going to "n"
So code that would work in my case is:


qv_x.append(n/i);

Am I right? Of course I should make some exception for i = 0;
I can't stand in job I want go home and check. :)

d_stranz
10th May 2017, 18:44
But how FFT knows that 1 second is represent by 10000.

The FFT does not know anything about "time". All it knows about is "point number". -You- know about time, and -you- know that "point # 0" is "time" 0, and "point # 1" is "time" 1 ms, etc. But you can use the FFT to transform data sampled in any unit space - for example, there are scientific instruments called interferometers where the amplitude of the signal is sampled at different distances between two mirrors. The units of x are in cm, not sec.

The FFT is simply a mathematical function that transforms an array "x" into an array "y". It is no different from a function that takes any array of numbers [0, 2, 14.5, 6, 3.7, ...] and returns another array of numbers where every number in the first array is multiplied by 2: [0, 4, 29, 12, 7.4, ...]. Now, it could be that those numbers represent the amplitude of an audio signal sampled at 0 ms, 1 ms, 2 ms, ... but the "multiply by 2" function doesn't know (and doesn't care).

The FFT is no different. The x axis of the data could be any units at all, as long as the spacing between the x values is constant, because the only thing the FFT function cares about is the y values.

The x axis units after the FFT are the inverse of the units before the FFT, so if your input data is in seconds (i.e. one sample every "m" seconds), the x axis in the FFT result is in units of Hz (frequency), and it is linear in those units, not the units of the input. If you give the FFT the amplitude of a pure sine wave sampled at above the Nyquist frequency, then the output will be a single peak at the period of the sine wave. The width of that peak is simply a function of the number of points you give to the FFT - if you give it a finite number, then it will have a finite width. If you give it an infinite length input, the output will be a zero-width spike.

So you will notice in the plot you posted, the top plot (sine wave) has units of seconds, the bottom plot (FFT output) has units of Hz (1/sec), and both plots are linear in those units - every point in the top plot represents an amplitude at each "m" second interval, and every point in the bottom plot represents the FFT result at every 1 Hz unit.


I've never been there, but my wife and I will go on the vacation to California in September.

Well, I hope you have a great vacation. September is a very good time to come to California because it is some of the best weather of the year. Remember that California is a big place - more than 1200 km from top to bottom - and weather that can change from > 45 C to < 0 C over a distance of 200 km as you go from the desert into the mountains.

pajczur
10th May 2017, 22:58
Hello,
what can I say, just again: Thanks a lot.
I still fight with my FFT isue. So if I am still not annoying too much, I would like to ask one more: [REAL] and [IMAG], what is this? Of course I know IMAG is shortcut from Imaginary Number ( j^2 = -1 ), and REAL is just "real numbers". But how to use them? In one tutorial I found somebody give zeros for all instances of IMAG, something like that:


for (int i; i<n; i++)
{
input[i][REAL] = // any input function, for example: sin(2*M_PI*i/1000);
input[i][IMAG] = 0; // Why zero?
}


I wonder if for my issue, do I need add any value to [IMAG] (Just to remind: my issue is to check musical* freq of recorded speech). Or I need always use zeros for IMAG?
And as output do I need to check output[i][REAL] and output[i][IMAG] or I need only one of them? And which one?

Maybe I should mention, that on the header of my project I defined (according to tutorial, but I don't understand it):


#define REAL 0
#define IMAG 1


I suppose those two types of input/output FFT values have something to deal with odds and evens instances of "i", but I can't exactly understand it.
For any help thanks in advance.

d_stranz
11th May 2017, 00:44
There are two types of FFT: real and complex. Theoretically, Fourier transform is defined as a function (F) that transforms a function of complex variables (f(x)) into another function of complex variables (g(x) == F( f( x ) )), where the complex variable "x" has a real part and an imaginary part: (a + bi). The Fast Fourier Transform is a type of Discrete Fourier Transform, where instead of transforming a function of complex variables, it transforms a series of complex numbers: F( [x0, x1, x2, ..., xn ] ) == F( [(a0 +b0i), (a1 + b1i), ..., (an + bni) ] ).

If you are sampling a real-world signal, then it has only real parts, the imaginary parts are all zero: (a + 0i).

There are usually two different implementations of the FFT in most libraries: a real FFT and a complex FFT. I do not know what these are called in fftw; it has been a long time since I looked at it. In the case of a real FFT, you only have to supply the real parts of the amplitude, because the imaginary parts are understood to be zero. The function will return to you an array containing only the real values of the transformed numbers. For a complex FFT, you have to supply both real and imaginary parts.

It depends on the library how this is implemented. In some libraries, you have two arrays, one for the real parts and another for the imaginary parts. In other libraries, you have one array of length 2n, where the even indices [0, 2, 4, ...] are for the real parts, and the odd indices [1, 3, 5, ...] are for the imaginary parts. In others, indices [0, 1, 2, ..., (n/2)-1] are for the real parts and [n/2, n/2 + 1, ..., n-1] are for the imaginary parts. You will have to look at the documentation and examples for fftw (http://www.fftw.org/fftw3_doc/One_002dDimensional-DFTs-of-Real-Data.html#One_002dDimensional-DFTs-of-Real-Data) to see what they use.

It also depends on the library how you retrieve the results of the FFT from the array it returns. Some implementations return a two-sided result with both positive and negative sides of the FFT, others return only the positive side. Again, you will have to look at the fftw documentation (http://www.fftw.org/fftw3_doc/Real_002dto_002dReal-Transforms.html#Real_002dto_002dReal-Transforms) for the function you choose.

pajczur
15th May 2017, 12:26
Hello again,
sorry for late reply, but I had more and more problems with FFT implementation in my project, and I was feeling it's unkind to ask more questions - you already helped me a lot.

But finally I figured it out and it works for me!!! All of that took me more then one week of realy intense thinking. But when I found the solution I had feeling like I'm discoverer, great feeling :)

And a lot of that thanks to your support. Thank you again.
Also very helpful was those movies - just to better understangind DFT and FFT at all:
https://www.youtube.com/watch?v=mkGsMWi_j4Q&t=317s
https://www.youtube.com/watch?v=htCj9exbGo0&list=WL&index=119

But now I am thinking about next issue. My freq recognizer works inaccurate, values are jumping like a crazy, and I feel like it reacts little bit too slow.

I think it's matter of noise. So I think about low pass filter. But I am not sure how to do that.

I thought about solution like that:
to create one more instance of FFT operation to manipulate (clean up) the output of that FFT, and then use invert FFT, and that inverted (clean) signal send to my main (second one) FFT frequency recognizer.

But after research the internet about C++ low pass filtering, I found out that nobody even mention about FFT. In case of Low Pass filters everybody talking about some bitwise operations. But I have only little experience with bitwise operators, so I am discouraged to diving in that kind of solution - after spending whole week for just simple FFT implementation.

Could you adice me anything in that case? Where I should start with my low pass filter implementation?
Great thanks in advance for any help and best regards.

d_stranz
15th May 2017, 19:58
Where I should start with my low pass filter implementation?

Read about "apodization" on Wikipedia, and follow the link there to "apodization function".

The basic procedure is this:

- perform the forward FFT on your data
- apply the apodization function to the FFT data
- perform the reverse FFT on this to get the filtered data back

The idea is that since the FFT array is in increasing frequency order, performing operations on the coefficients of certain frequencies and then transforming back will change the resulting signal. A low pass filter basically modifies the high-frequency components of the FFT array to reduce their magnitude (and therefore reduce their influence when the data is transformed back).

There are many types of low-pass filters. The simplest one is just to truncate the FFT data after some point by setting the remaining coefficients to zero.