Expectation-Maximization Algorithm

Broadly used algorithm in situations where the observed data is considered incomplete when finding Method of maximum likelihood

standard tool for mixture models

The complete data $y$

y = (x^{'}, z^{'})^{'}

consists of observed data $x$ (which is incomplete) and unobserved data $z$

Formulation

Suppose we have the pdfs or pmfs for the complete data $f_{c} (y; θ)$ and the incomplete data $f (x; θ)$

The complete-data log-likelihood could be formed for $θ$ if $y$ are fully observed and available

\log L_{c} (θ) = \log f_{c} (y; θ) = \log (f_{c} (x, z; θ))

In the first iteration, the E-step finds the expectation of the complete-data log likelihood given $θ^{(0)}$ , the initial value for $θ$

Q (θ; θ^{0}) = E [\log L_{c} (θ; x, z) | θ^{(0)}]

The M-step is to find the value of $θ$ that maximizes the $Q (θ; θ^{(0)})$ , $θ^{(1)}$ such that

Q (θ^{(1)}; θ^{(0)}) \geq Q (θ; θ^{(0)})

for all possible $θ$

On the $(i + 1)^{t h}$ iteration, we define:
E-Step: Calculate

Q (θ; θ^{(1)}) = E [\log L_{c}]

M-Step: Find ${\underset{\sim}{θ}}^{(i + 1)}$ that maximizes $Q (θ; θ^{(i)})$ , that is

Q (θ^{(i + 1)}; θ^{(i)}) \geq Q (θ; θ^{(i)})

for all $θ$

Stopping Criteria

The pdf of the incomplete data is given by integrating out z and getting the marginal of x

f (x; θ) = \int f_{c} (y; θ) d z = \int f_{c} (x, z; θ) d z = \sum_{z} f_{c} (x, z; θ)

Repeat both the E and M steps alternatively until convergence as the difference of the incomplete-data likelihood $L (θ^{(1)}) - L (θ^{(i)})$ is very small and less than a pre-specified tolerance amount.

L (θ^{(i)}) = \prod f (x_{i}; θ)