Method of maximum likelihood

Finds a value for $θ$ such that it gives the maximum probability of observing the observed data in comparison to other values of $θ$

If $x_{1}, \dots, x_{n}$ are observed values of a random sample from a population with the parameter $θ$ , the likelihood function of $θ$ is

L (θ) = L (θ; x_{1}, \dots, x_{n}) = f (x_{1}, \dots, x_{n}; θ) = \prod_{i = 1}^{n} f (x_{i}; θ)

The maximum likelihood estimate (MLE) of $θ$ is the value of $θ$ that maximizes the likelihood function $L (θ)$

Under the regular case, we use the log-likelihood function, as we will only need to differentiate a sum of functions instead of a product

l (θ) = \ln L (θ) = \ln \prod f (x_{i}; θ) = \sum_{i = 1}^{n} \ln f (x_{i}; θ)

And by a lemma, the $\hat{θ}$ that maximizes $L (θ)$ also maximizes $l (θ)$

Properties

MLE of $θ$ is a sufficient statistic, if one exists, then MLE is a function of it
is known to be asymptotically efficient
Invariance principle: if $\hat{θ}$ is the MLE of $θ$ , then $g (\hat{θ})$ is the MLE of the function $g (θ)$
Lack of uniqueness: there could be more than one MLE

Example

An experiment with 6 coin tosses, 2 heads

General pdf with arbitrary parameter:

f (2; p) = (\binom{6}{2}) p^{2} (1 - p)^{4}

Different values of $p$ give us different probabilities of getting that sample

\begin{aligned} p & = \frac{1}{4}, f (2; 1 / 4) = 0.3 \\ p & = \frac{1}{3}, f (2; 1 / 3) = higher idk \end{aligned}

Finding the MLE of $i i d E x p o n e n t i a l (θ)$

\begin{aligned} L (θ) & = \prod \frac{1}{θ} e^{- x_{i} / θ} = θ^{- n} e^{- \sum x_{i} / θ} \\ l (θ) & = \ln L (θ) = \ln (θ^{- n} e^{- \sum x_{i} / θ}) \\ = - n \ln θ - \sum \frac{x_{i}}{θ} \\ \frac{d l (θ)}{d θ} & = - \frac{n}{θ} + \sum \frac{x_{i}}{θ} = 0 for crit point \\ θ & = \sum \frac{x_{i}}{n} = \bar{X} = \hat{θ} \\ \frac{d^{2} l (θ)}{d θ^{2}} & = \frac{n}{θ^{2}} - {\frac{2 \sum x_{i}}{θ^{3}} |}_{θ = \hat{θ} = \bar{X}} \\ = \frac{n \bar{X} - 2 n \bar{X}}{(\bar{X})^{3}} = \frac{- n \bar{X}}{(\bar{X})^{3}} = \frac{- n}{(\bar{X})^{2}} < 0 \\ ∴ l (θ) & has a maximum at \hat{θ} = \bar{X}, a MLE of θ \end{aligned}

On a irregular case, $i i d \sim U n i f o r m (0, θ)$

\begin{aligned} L (θ) & = \prod f (x_{i}; θ) = \prod \frac{1}{θ} \\ l (θ) & = \ln θ^{- n} = - n \ln θ \\ \frac{d l (θ)}{d θ} & = - \frac{n}{θ} = 0 \to no solution \\ \dots \\ L (θ) & = \prod \frac{1}{θ} I [0 < x_{i} < θ] = \frac{1}{θ^{n}} I [0 < x_{1}, \dots, x_{n} < θ] \\ aim to increase & L (θ) by decreasing θ to its lowest value possible, X_{(n)} \\ ∴ & \hat{θ} = X_{(n)} is the MLE o f θ \end{aligned}

Case: 2+ parameters (with hessian)
MLE of $μ, μ^{2}, σ, σ^{2} \sim N (μ, σ^{2})$

\begin{aligned} L (θ) & = L (μ, σ^{2}; \tilde{X}) = (2 π σ^{2})^{- n / 2} e^{- 1 / 2 σ^{2} \cdot \sum (x_{i} - μ)^{2}} \\ l (μ, σ^{2}) & = - \frac{n}{2} \ln (2 π σ^{2}) - \frac{1}{2 σ^{2}} \sum (x_{i} - μ)^{2} \\ = - \frac{n}{2} \ln (2 π) - \frac{n}{2} \ln (σ^{2}) - \frac{1}{2 σ^{2}} \sum (x_{i} - μ)^{2} \\ \frac{\partial l}{\partial μ} & = - \frac{1}{2 σ^{2}} \sum 2 (x_{i} - μ) (- 1) = \frac{\sum (x_{i} - μ)}{2 σ^{2}} = 0 \to \hat{μ} = \bar{X} \\ \frac{\partial l}{\partial σ^{2}} & = - \frac{n}{2 σ^{2}} + \frac{\sum (x_{i - μ})^{2}}{2 (σ^{2})^{2}} = \frac{- n σ^{2} + \sum (x_{i} - μ)^{2}}{2 (σ^{2})^{2}} = 0 \to {\hat{σ}}^{2} = \sum \frac{(x_{i} - μ)^{2}}{n} \\ using \hat{μ} = \bar{X}, we have σ^{2} = \frac{1}{n} \sum (x_{i} - \bar{X})^{2} \end{aligned}

\text{if} \begin{bmatrix} \frac{\partial^2 l }{\partial \mu^2} & \frac{\partial {#2l} }{\partial \mu \partial \sigma^2} \\ \frac{\partial^2 l}{\partial \sigma^2 \partial \mu} & \frac{\partial {#2} l}{\partial (\sigma^2 )^2} \end{bmatrix} < 0, \text{ then our} \hat{\mu}, \hat{\sigma}^2 \text{ are MLEs of }\mu, \sigma^2

Then the MLE of $μ^{2} : {\hat{μ}}^{2} = g_{1} (\hat{μ}) = g_{1} (\bar{x}) = {\bar{X}}^{2}$
And the MLE of $σ : \hat{σ} = g_{2} ({\hat{σ}}^{2}) = \sqrt{\frac{1}{n} \sum (x_{i} - \bar{X})^{2}}$

Case: area of solutions
$\sim U n i f o r m (θ, θ + 1)$

\begin{aligned} L (θ) & = \prod f (x_{i}; θ) = \prod 1 I [θ < c_{i} < θ + 1] \\ = 1 I [θ < x_{1}, \dots, x_{n} < θ + 1] \\ = 1 I [θ < X_{(1)}] I [X_{(n)} < θ + 1] \\ L is maximized when both inequalities are true \\ X_{(n)} - 1 < θ < X_{(1)} \\ \to & \forall θ \in [X_{(n)} - 1, X_{(1)}], θ is MLE of θ \end{aligned}