10. Parameter Estimate using the Maximum Likelihood Method#

10.1. Parameter Determination#

  • Often, the goal of an experiment is the estimation of parameters of a model.

  • To achieve this result, a dataset \(x_i\) is collected and used as input to algorithms called estimators, which estimate the parameters of interest.

  • The estimates produced by an estimator are random variables because estimators are functions of random numbers (the data).

    • They have their own probability distribution. estimates

  • There are libraries that perform this task for us. Among them, iminuit contains various algorithms for this purpose. In jargon, the parameter determination operation is called fitting.

10.2. Maximum Likelihood#

  • The technique of maximum likelihood is based on the assumption that the estimate of the sought-after parameters corresponds to the value that maximizes the likelihood, defined as the product of the value of the probability density distribution calculated for each measurement:

    \[ \mathcal{L}(\theta)= \mathcal{L}(\theta, x)= f(x_1,\theta) \times \ldots \times f(x_N,\theta) = \prod_{i=1}^{N} f(x_{i},\theta) \]
  • The likelihood is a function of both the measurements and the parameters, but we emphasize the dependence on the parameters because the data are fixed for finite measurements.

  • The function that estimates the parameters is obtained from the equation:

    \[ \frac{\partial \mathcal{L}(\theta)}{\partial\;\theta} = 0 \]

10.3. Logarithm of the Likelihood#

  • Usually, the logarithm of the likelihood function is used, indicated with a lowercase italic letter:

\[ \ell(\theta) = \log (\mathcal{L}(\theta)) \]
  • In fact, since the logarithm is a monotonically increasing function, the extrema of a function and its logarithm are found at the same point.

  • The logarithm of a product of terms is equal to the sum of the logarithms of the individual terms. Therefore, taking the derivative of the logarithm of the likelihood function is simpler than taking the derivative of the likelihood function:

    \[ \frac{\partial l(\theta)}{\partial\theta} = \frac{\partial\log(\mathcal{L}(\theta))}{\partial\theta} = \frac{\partial\log\left(\prod_{i=1}^N f(x_i,\theta)\right)}{\partial\theta} = \sum_{i=1}^N \frac{\partial\log\left(f(x_i,\theta)\right)}{\partial\theta} \]

10.4. Uncertainty of the Parameter Estimate#

  • It is known that there is a graphical method for determining the sigma associated with the parameters estimated using the maximum likelihood method.

  • This involves finding the points of intersection between the log-likelihood function and the horizontal line with a coordinate equal to the maximum of log-likelihood - 0.5 and calculating their half-distance.

    Why \(-0.5\)?

    To demonstrate that the points corresponding to \(x\pm\sigma\) can be obtained under the condition \(l(\theta) = l(\theta)^{\text{max}} - 0.5\), expand the function \(l(\theta)\) in a Taylor series around the estimator of the expected value \(\hat{\theta}\), approximate the expected value of the second derivative of \(\ell(\theta)\) with its value at the maximum, and substitute it with the variance of the estimator, applying the Rao-Cramér theorem.

10.5. Properties of the Maximum Likelihood Parameter Estimates#

  • They are consistent

  • They are asymptotically unbiased, meaning they have zero bias as the number of measurements N tends to infinity

  • They are asymptotically efficient, implying they have the minimum possible variance for the number of measurements N tending to infinity

10.6. Building a Likelihood and Determining a Parameter#

  • We will use the example of the exponential distribution to determine its sole parameter τ using the maximum likelihood method:

    \[ f(x,\tau) = \frac{1}{\tau}e^{-\frac{x}{\tau}} \]
  • It is analytically proven that the arithmetic mean of the data is an estimator of the parameter τ,

  • In this case, we want to construct a numerical estimator of the parameter, as an example of a general case where analytical calculation is not feasible.

10.6.1. Finding the Maximum of the Logarithm of the Likelihood#

  • The golden ratio algorithm developed during Lecture 6 can be used to find the maximum of the log-likelihood:

    def sezioneAureaMax_LL (
        g,              # funzione di likelihood trovare il massimo
        pdf,            # probability density function of the events
        sample,         # sample of the events
        x0,             # estremo dell'intervallo          
        x1,             # altro estremo dell'intervallo         
        prec = 0.0001): # precisione della funzione        
    
    • The program should be written to find the maximum of a function

    • The input parameters are

      • the function to find the extremum of (g),

      • the single measurement probability density function (pdf),

      • the interval to search for the maximum for the parameter τ,

      • the list containing the data,

      • and the precision at which to stop the calculation, with a default value.

10.6.2. Coding Example#

  • After generating pseudo-random numbers distributed according to an exponential probability density, which can be visualized with a histogram, the developed functions can be used starting from a reasonably chosen interval to search for the maximum

  • The result of this algorithm can be compared with the arithmetic mean of the numbers, which for this particular probability distribution is an estimator of τ

10.7. Uncertainty of the \(\tau\) Estimate#

  • The estimator of τ is a random variable, meaning it has its own probability distribution

  • Therefore, in addition to having a point estimate obtained by maximizing the logarithm of the likelihood, it also has a sigma

  • A graphical method is often used to determine this sigma, based on the fact that the likelihood is asymptotically a Gaussian function with respect to the parameter, so the log-likelihood function is parabolic

  • It is demonstrated that the two points τ - στ and τ + στ are found by nullifying the following function:

    \[ h(\tau) = \ell(\tau) - \ell(\tau_{\textrm{max}}) + \frac{1}{2} \]

10.7.1. Visualizing the Likelihood#

  • Plotting the function h(τ) results in the following outcome, when varying the number of events used to calculate the log-likelihood function: loglikelihood_profile

  • As the number of used events increases, the function h(τ) becomes narrower, meaning the sigma of the estimator decreases

  • As the number of used events increases, the function h(τ) becomes more symmetric, indicating its asymptotic behavior

10.7.2. Finding the Intersection Points#

  • The bisection method can be used to find τ - στ and τ + στ

    def intersect_LLR (
        g,              # funzione di cui trovare lo zero
        pdf,            # probability density function of the events
        sample,         # sample of the events
        xMin,           # minimo dell'intervallo          
        xMax,           # massimo dell'intervallo 
        ylevel,         # value of the horizontal intersection    
        theta_hat,      # maximum of the likelihood    
        prec = 0.0001): # precisione della funzione        
        '''
        Funzione che calcola zeri
        con il metodo della bisezione
        '''
        def gprime (x) :
            return g (x, pdf, sample, theta_hat) - ylevel
    
        xAve = xMin 
        while ((xMax - xMin) > prec) :
            xAve = 0.5 * (xMax + xMin) 
            if (gprime (xAve) * gprime (xMin) > 0.) : xMin = xAve 
            else                                    : xMax = xAve 
        return xAve 
    

10.8. Use in the Main Program#

  • Within a relatively narrow interval around the maximum of the log-likelihood function, it is known that the function h(τ) has two zeros, one to the right and one to the left of its maximum

  • By invoking the bisection function twice, the desired points can be calculated:

    from likelihood import sezioneAureaMax_LL, intersect_LLR
    # ...
    tau_hat = sezioneAureaMax_LL (loglikelihood, exp_pdf, sample, 0.5, 5., 0.0001)    
    tau_hat_minusS = intersect_LLR (loglikelihood_ratio, exp_pdf, sample, 0.5, tau_hat, -0.5, tau_hat)
    tau_hat_plusS = intersect_LLR (loglikelihood_ratio, exp_pdf, sample, tau_hat, 5., -0.5, tau_hat)
    
  • The interval between the two intersection points tau_hat_minusS and tau_hat_plusS is the confidence interval associated with the obtained estimator

10.8.1. Comparison to the Analytical Estimate#

  • In the case of the exponential distribution, it is known that the variance is the square of the mean

  • The uncertainty on the mean is the square root of the variance, divided by the square root of the number of events

  • Thus, the uncertainty on the estimator of τ, denoted with a circumflex accent, can be estimated in this case as:

    \[ \sigma_{\tau} = \frac{\sigma}{\sqrt {N}} = \frac{\tau}{\sqrt {N}} \]
  • Therefore, the value obtained from the graphical method can be compared with the one calculated from the arithmetic mean.

10.9. Probability Distribution of the Estimators#

  • The probability distribution of estimators can be reconstructed in a frequentist manner, simulating the experiment of collecting events many times, using the toy experiment technique

  • To generate a toy experiment, the true value of the parameter (mu_true in the present case) and the number of collected events (number_events) need to be hypothesized

  • To construct the distribution of the estimator of τ, two procedures need to be repeated many times (N_toys):

    • Generation of a toy experiment

    • Calculation of the estimator given that generation, as if they were the measured events

10.9.1. Generating a Toy Experiment#

  • To generate a toy experiment, pseudo-random numbers are usually used, employing existing algorithms adapted to the case at hand

  • To generate pseudo-random numbers according to an exponential distribution, the technique of the inverse of the cumulative function can be used:

    from myrand import generate_exp
    # ...
    singleToy = generate_exp (tau_true, sample_size)
    
    • The function takes the true value of τ as input and returns a list of pseudo-random numbers distributed according to the corresponding exponential probability distribution

10.9.2. Finding the Parameter with the Maximum Likelihood#

  • The maximum likelihood method can be applied as developed earlier, obtaining a result for each toy experiment

    tau_hats = []
    for iToy in range (N_toys) :
        singleToy = generate_exp (tau_true, N_evt)
        tau_hat_toy = sezioneAureaMax_LL (loglikelihood, exp_pdf, singleToy, 0.5, 5., 0.0001)
        tau_hats.append (tau_hat_toy)
    

10.9.3. Exercise Results#

  • Both steps are incorporated into a general loop, where an histogram (or other statistical tools) can be populated

  • The result clearly shows the evolution of the probability distribution of the estimator as the number of available measurements increases: estimator_pdf_2.png

Note

  • The exercises for the lecture can be found here