PDA

View Full Version : How to find linear part of measurement data? Calculate linear equation?



HappyCoder
1st June 2016, 13:01
Hello,

i have xy-measurement data that looks like this:

11962

I want to find the linear part of a curve and want to draw or calculate the red linear function.
I have never done this and need some hints how to do that.

Is there a library (LGPL) that can do this?
Is there an algorithm for this?

Any help or hints are welcome.
Thx

ChrisW67
1st June 2016, 23:07
What you want to do is called linear regression (https://en.m.wikipedia.org/wiki/Linear_regression) and you probably should start with the ordinary least squares (http://setosa.io/ev/ordinary-least-squares-regression/) approach.

You might find some (heavy duty) library options here: http://stackoverflow.com/questions/2197623/least-squares-regression-in-c-c
I cannot put my hands on a simple function in C right at the moment.

d_stranz
2nd June 2016, 18:27
I agree with ChrisW67 - start with an ordinary least squares approach. Goggle for that, you'll find a googleplex of hits. Probably Wikipedia will be the clearest place for an explanation.

Least squares fits a straight line to a set of data points in such a way that the total squared deviation (sum of the differences of the actual data value from the fitted value, squared) is a minimum.

In addition to the error, the least squares method also lets you calculate a regression coefficient. This is a measure of how well the data fall on a straight line. A regression coefficient = 1.0 means a perfect fit, 0 means no fit. The least squares method will produce the equation for a line for any data (except something pathological, where all the points are the same, for example). However, if the data aren't really linear, then you will get a poor regression coefficient.

The trick in your case is that near the origin, your data deviate strongly from linear. Since you are interested in finding the line that best fits the linear part of your data, you need to tell the least squares algorithm to ignore the non-linear part of your curve. How do you do that? By first calculating the fit using all the data, calculating the regression coefficient, and then iteratively recalculating the fit as you slowly back away from the origin, dropping one point at a time from the fit calculation, until the regression coefficient improves to an acceptable value (i.e. closer to 1.0). You might implement something that compares the current coefficient to the last one, and if it has improved by more than "x" percent, keep going, otherwise stop and sy it's good.