11. Least-Squares Fitting#
11.1. The method of Least Squares#
The method of least squares is based on a principle independent of that of maximum likelihood
Parameters θ are chosen to make the distance minimal between the model and the data, according to a metric defined by the average standard deviation of the measurements with respect to the model
11.1.1. An immediate example#
To determine the mean \(\mu\) of a set of measurements \(x_i\), the following function can be minimized:
\[ Q^2 = \sum_{i=1}^N \left( \frac{x_i-\mu}{\sigma}\right)^2 \]
11.1.2. The \(y=\phi(x)\) case#
The same metric is often used for data regression, also known as fitting
Let N pairs of independent measurements be given in the form \((x_i,y_i)\), for which:
the uncertainty on the value \(x_i\) is negligible or zero
the uncertainty on the value \(y_i\) is \(\sigma_i\)
Assume that the two variables \(x_i\) and \(y_i\) are related to each other through a function \(\phi\) such that \(y=\phi(x,\theta)\)
The function \(Q^2(\theta)\) is defined as:
11.1.3. Determination of the fit parameters \(\theta\)#
In this case, the parameters \(\theta\) (\(\theta\) may be a vector) are determined by finding the minimum of the function \(Q^2(\theta)\):
\[ \frac{\partial Q^2(\theta)}{\partial \theta_l} = 0 \]
Various numerical techniques exist to find the minimum of the function
11.1.4. Properties of the Least-Squares method#
If the residuals \(\epsilon_i\) of \(y_i\) with respect to \(\phi(x_i,\theta)\) have a zero mean and finite, constant variance, that is, independent of \(y\), and \(\phi(x_i,\theta)\) is linear in the \(\theta\) parameters, then
the least squares method is an unbiased estimator of the parameters \(\theta\)
and it has the minimum variance among all unbiased linear estimators (in \(y\)), regardless of the probability distribution of the residuals
If the residuals \(\epsilon_i\) are distributed according to a Gaussian probability density distribution, the minimum of the function \(Q^2(\theta)\) is distributed according to a \(\chi^2\) distribution with N-k degrees of freedom,
where N is the number of pairs \((x_i,y_i)\)* and k is the number of estimated parameters
11.1.5. Least-Squares for linear functions#
In the case where the function g(x) is linear in the parameters \(\theta\), the minimization equations can be solved analytically
\[ \phi(x,\theta) = \sum_{i=1}^{k}\theta_i h_i(x) \]
Note
An example of a linear function is the line \(\phi(x,\theta) = \theta_1 + \theta_2 x\):
\(h_1(x) = 1\)
\(h_2(x) = x\)
Another example of a linear function is a parabola \(\phi(x,\theta) = \theta_1 + \theta_2 x + \theta_3 x^2\):
\(h_1(x) = 1\)
\(h_2(x) = x\)
\(h_3(x) = x^2\)
11.2. A Regression exercise#
Using pseudo-random number generation, it is possible to simulate the collection of N independent measurements \((x_i,y_i)\) based on an initial model \(y=\phi(x,\theta)\), for example:
epsilons = generate_TCL_ms (0., epsilon_sigma, 10) x_coord = np.arange (0, 10, 1) y_coord = np.zeros (10) for i in range (x_coord.size) : y_coord[i] = func (x_coord[i], m_true, q_true) + epsilons[i]
where pseudo-random number generation (
generate_TCL_ms
) is used to find the values of the terms \(\epsilon_i\)the \(x_i\) points may also be generated is a pseudo-random manner
11.2.1. Plotting the data#
Pairs of measurements like those generated are usually represented in the form of scatter plots:
sigma_y = epsilon_sigma * np.ones (len (y_coord)) fig, ax = plt.subplots () ax.set_title ('linear model', size=14) ax.set_xlabel ('x') ax.set_ylabel ('y') ax.errorbar (x_coord, y_coord, xerr = 0.0, yerr = sigma_y, linestyle = 'None', marker = 'o')
sigma_y
is the expected uncertainty in this case
11.2.2. Parameters determination#
The
iminuit
library provides tools that automatically apply the least squares method to find the parametersThe fit operation is performed with the following instructions:
from iminuit import Minuit from iminuit.cost import LeastSquares # generate a least-squares cost function least_squares = LeastSquares (x_coord, y_coord, sigma_y, func) my_minuit = Minuit (least_squares, m = 0, q = 0) # starting values for m and q my_minuit.migrad () # finds minimum of least_squares function my_minuit.hesse () # accurately computes uncertainties
the
least_squares
object contains the cost function that will be minimizedthe
my_minuit
object will operate the actual minimization, starting the search from the initial pointm = 0, q = 0
in the parameter phase spacethe
my_minuit.migrad ()
call performs the minimizationthe
my_minuit.hesse ()
evaluates parameter uncertainties by computing an approximation of the parameter covariance matrix
11.3. Analyzing the result of the regression#
11.3.1. Fit convergence#
Whether the minimization has been successful can be inquired:
# global characteristics of the fit is_valid = my_minuit.valid print ('success of the fit: ', is_valid)
The value of \(Q^2\) and the number of degrees of freedom may be obtained as well:
Q_squared = my_minuit.fval print ('value of the fit Q-squared', Q_squared) N_dof = my_minuit.ndof print ('value of the number of degrees of freedom', N_dof)
11.3.2. Fit quality#
In case the probability density distribution of the individual \(y_i\) follows a Gaussian probability density function, the \(Q^2_{\text{min}}\) quantity follows the \(\chi^2\) distribution with \(N-k\) degrees of freedom, where \(N\) is the number of fitted bins and \(k\) is the number of determined parameters
Under these conditions, the \(\chi^2\) test can be used to determine the goodness of the fit by calculating the probability that the result could be worse than what was obtained, integrating the \(\chi^2(N-k)\) distribution from \(Q^2_{\text{min}}\) to infinity.
The values of \(Q^2_{\text{min}}\) and the number of degrees of freedom can be obtained from the
Minuit
variable as well:print ('Value of Q2: ', my_minuit.fval) print ('Number of degrees of freedom: ', my_minuit.ndof)
The p-value associated to the fit result may be calculated from the cumulative distribution function of the \(\chi{}^2\) probability density function:
from scipy.stats import chi2 # ... print ('associated p-value: ', 1. - chi2.cdf (my_minuit.fval, df = my_minuit.ndof))
11.3.3. Parameter values and uncertainty#
The fit results are stored in the
my_minuit
oject:for par, val, err in zip (my_minuit.parameters, my_minuit.values, my_minuit.errors) : print(f'{par} = {val:.3f} +/- {err:.3f}') # formatted output m_fit = my_minuit.values[0] q_fit = my_minuit.values[1]
my_minuit.parameters
contains the parameter namesmy_minuit.values
the parameter valuesmy_minuit.errors
the parameter uncertainty
11.3.4. Covariance matrix of the fit parameters#
The covariance matrix of the resulting parameters can be printed on the screen:
print (my_minuit.covariance)
Its individual values can be accessed as well, either by numerical index, or by unsing parameter names:
print (my_minuit.covariance[0][1]) print (my_minuit.covariance['m']['q'])
The parameter correlation matrix is easily calculated from the covariance one:
print (my_minuit.covariance.correlation ())
11.3.5. Quick inspection of the fit results#
The screen output of the fit sumamrises all the information above, and may be displayed with the call:
display (my_minuit)
11.4. Uncertainty determination and the least-squares method#
In the case where the random variables \(\epsilon_i\) have a Gaussian probability density distribution, the value of \(Q^2\) associated with the fit follows a \(\chi^2\) distribution with a number of degrees of freedom equal to the degrees of freedom of the fit
the degrees of freedom of the fit is equal to the number of data points minus the number of estimated parameters
This property of the least squares method allows, assuming that the random variables \(\epsilon_i\) are Gaussian, the estimation of uncertainties on the values \(y_i\)
Note
The exercises for the lecture may be found here