Fitting curves

The routine used for fitting curves is part of the scipy.optimize module and is called scipy.optimize.curve_fit(). So first said module has to be imported.

>>> import scipy.optimize

The function that you want to fit to your data has to be defined with the x values as first argument and all parameters as subsequent arguments.

>>> def parabola(x, a, b, c):
...     return a*x**2 + b*x + c
...

For test purposes the data is generated using said function with known parameters.

>>> params = [-0.1, 0.5, 1.2]
>>> x = np.linspace(-5, 5, 31)
>>> y = parabola(x, params[0], params[1], params[2])
>>> plt.plot(x, y, label='analytical')  
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend(loc='lower right')  
<matplotlib.legend.Legend object at ...>
>>> plt.show()
../_images/fitting_curves-3.png

Then some small deviations are introduced as fitting a function to data generated by itself is kind of boring.

>>> r = np.random.RandomState(42)
>>> y_with_errors = y + r.uniform(-1, 1, y.size)
>>> plt.plot(x, y_with_errors, label='sample')  
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend(loc='lower right')  
<matplotlib.legend.Legend object at ...>
>>> plt.show()
../_images/fitting_curves-4.png

Now the fitting routine can be called.

>>> fit_params, pcov = scipy.optimize.curve_fit(parabola, x, y_with_errors)

It returns two results, the parameters that resulted from the fit as well as the covariance matrix which may be used to compute some form of quality scale for the fit. The actual data for the fit may be compared to the real parameters:

>>> for param, fit_param in zip(params, fit_params):
...     print(param, fit_param)
...
-0.1 -0.0906682795944
0.5 0.472361903203
1.2 1.00514576223
>>> y_fit = parabola(x, *fit_params)
>>> plt.plot(x, y_fit, label='fit')  
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend(loc='lower right')  
<matplotlib.legend.Legend object at ...>
>>> plt.show()
../_images/fitting_curves-6.png

As you can see the fit is rather off for the third parameter c. A look at the covariance matrix also shows this:

>>> print(pcov)
[[  1.64209005e-04   1.75357845e-12  -1.45963560e-03]
 [  1.75357845e-12   1.16405938e-03  -1.73642112e-11]
 [ -1.45963560e-03  -1.73642112e-11   2.33217333e-02]]

But to get a more meaningful scale for the quality of the fit the documentation recommends the following:

>>> print(np.sqrt(np.diag(pcov)))
[ 0.01281441  0.03411831  0.15271455]

So the fit for a is quite well whereas the quality c is the worst of all the parameters.

Note

This routine works by iteratively varying the parameters and checking whether the fit got better or worse. To help the routine find the best fit it is hence a good idea to give it a good starting point. this can be done using the p0 argument of curve_fit(). In some cases this is even necessary.