Gaussian Finite Mixture Model

model based clustering method

Finite mixture models consider the situation that a sample is drawn from a population that consists of a finite number of components/subpopulations

WLOG, we adopt the common form of the pmf for a G-component mixture as

L (Ψ) = \prod_{i = 1}^{n} f (x_{i}; Ψ) = \prod_{i = 1}^{n} \sum_{g = 1}^{G} π_{g} f_{g} (x_{i}; θ_{g}),

for $i = 1 \dots n, Ψ = (π_{1}, π_{2}, \dots π_{G - 1}, θ_{1}, \dots, θ_{G})^{'}$

If we use the conventional Method of maximum likelihood we will not obtain an explicit solution for the parameters

\begin{aligned} \frac{\partial l}{\partial π_{g}} & = \sum_{i = 1}^{n} \frac{f (x_{i}; θ_{g}) - f_{G} (x_{i}; θ_{G})}{\sum^{g} π_{g} f (x_{i}; θ_{g})} = 0 \\ = f (x_{1}; θ_{g}) - f (x_{1}; θ) \end{aligned}

To use the Expectation-Maximization Algorithm we twist this problem as an incomplete-data problem. The observed data is $X$ and the unobserved data is the component membership of each observation

z = (z_{1}^{'}, \dots, z_{n}^{'})^{'}

y_{i} = (x_{i, z_{i}^{'}})^{'}

The complete-data log-likelihood for $Ψ$ has the form

Z_{i g} = {\begin{cases} 1, & if the i^{t h} obs is from g^{t h} component \\ 0 & otherwise \end{cases}

$Z_{i}$ follows a multinomial $(1, \underset{\sim}{P_{i}})$ , prior, posterior something

Now the complete-data likelihood for $Ψ$ has the form

\begin{aligned} L_{c} & = \prod_{i = 1}^{n} f (y_{i}; Ψ) = \prod_{i = 1}^{n} f (x_{i}, z_{i}; Ψ) \\ = \\ L_{c} (Ψ; y) & = \prod_{i = 1}^{n} \prod_{g = 1}^{G} [π_{g} f_{g} (x_{i}; θ_{g})]^{z_{i g}} \end{aligned}

Now we can formulate a general E-step and M-step for fitting a mixture model

l_{c} (Ψ; y) = \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g} \log [π_{g} f_{g} (x_{i}; θ_{g})] = \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g} [\log π_{g} + \log f_{g} (x_{i}; θ_{g})]

E-step

Given $Ψ^{(i)} = ({\underset{\sim}{π}}^{(i)}, θ^{(i)})$ , estimate $Z_{i g}^{(i + 1)}$ using Bayes Theorem

\begin{aligned} Z_{i g}^{(1)} = E (Z_{i g} | {\underset{\sim}{π}}^{(0)}, {\underset{\sim}{θ}}^{(0)}) & = P (Z_{i g} = 1 ∣ x_{i}, Ψ) \\ = \frac{P (x_{i} ∣ Z_{i g} = 1, Ψ) \cdot P (Z_{i g} = 1 ∣ Ψ)}{P (x_{i} ∣ Ψ)} \\ Z_{i g}^{(1)} & = \frac{π_{g} \cdot f_{g} (x_{i}; θ_{g})}{\sum_{h = 1}^{G} π_{h} f_{h} (x_{i}; θ_{h})} \end{aligned}

Which is necessary in the $Q$ formula:

\begin{aligned} Q (Ψ; Ψ^{(0)}) & = E (l_{c} (Ψ; \underset{\sim}{y}) | P s i^{(0)}) \\ = E (\sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g} [\log π_{g} + \log f_{g} (x_{i}; θ_{g})] | {\underset{\sim}{π}}^{(0)}, θ^{(0)}) \\ = \sum_{i = 1}^{n} \sum_{g = 1}^{G} E (Z_{i g} [\log π_{g} + \log f_{g} (x_{i}; θ_{g})] | {\underset{\sim}{π}}^{(0)}, θ^{(0)}) \\ = \sum_{i = 1}^{n} \sum_{g = 1}^{G} E (Z_{i g} | {\underset{\sim}{π}}^{(0)}, θ^{(0)}) [\log π_{g} + \log f_{g} (x_{i}; θ_{g})] \\ Z_{i g}^{(1)} & = E (Z_{i g} | {\underset{\sim}{π}}^{(0)}, {\underset{\sim}{θ}}^{(0)}) = \frac{π_{g}^{(0)} f (x_{i}; {\underset{\sim}{θ_{g}}}^{(0)})}{\sum_{g = 1}^{G} π_{g}^{(0)} f (x_{i}; {\underset{\sim}{θ_{g}}}^{(0)})} \end{aligned}

M-step

find $Ψ = (π, θ)$ that maximizes $Q (Ψ; Ψ^{(i)})$ based on ${\underset{\sim}{Z_{i g}}}^{(i)}$

\begin{aligned} Q (Ψ; Ψ^{(0)}) & = \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g}^{(1)} [\log π_{g} + \log f_{g} (x_{i}; \underset{\sim}{θ_{g}})] \\ = \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g}^{(1)} \log π_{g} + \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g}^{(1)} \log f_{g} (x_{i}; \underset{\sim}{θ_{g}}) \\ = Q_{1} (\underset{\sim}{π}; π^{(0)} \underset{\sim}{}) + Q_{2} (\underset{\sim}{θ}; \underset{\sim}{θ^{(0)}}) \end{aligned}

Maximizing $Q_{1}$

\begin{aligned} Q_{1} & = \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g}^{(i)} \log π_{g} + λ (1 - \sum_{g = 1}^{G} π_{g}) \end{aligned}

with a scalar $λ$ to find a solution, as $\sum_{g} π_{g} = 1, 1 - \sum_{g} π_{g} = 0$

Then maximizing $Q_{1}$ we get the values of $π_{g}$ based on $Z_{i g}^{(i)} :$

\frac{\partial Q_{1}}{\partial π_{g}} = \sum_{i = 1}^{n} Z_{i g}^{(i)} - λ = 0 \to π_{g} = \frac{\sum_{i = 1}^{n} Z_{i g}^{(i)}}{λ}

terms in the $\sum_{g}$ of a different $g$ go to 0

$λ$ is also an unknown variable, so to find the value of it that maximizes $Q_{1} :$

\frac{\partial Q_{1}}{\partial λ} = 1 - \sum_{g = 1}^{G} π_{g} = 0 \to 1 - \frac{\sum_{g = 1}^{G} \sum_{i = 1}^{n} Z_{i g}}{λ} = 1 - \frac{\sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g}}{λ} = 1 - \frac{\sum_{i = 1}^{n} 1}{λ} = 1 - \frac{n}{λ}

$∴$ $\hat{λ} = n$ and

{\hat{π}}_{g} = \frac{\sum_{i = 1}^{n} Z_{i g}^{(i)}}{n}, g = 1, \dots, G

intuitive result, use the estimate of $Z$ over the groups to find the share that should be from a certain population
$Z^{(i)}$ is expectation, continuous number 0-1, not binary as the real unknown data

Maximizing $Q_{2}$

Q_{2} = \sum_{i = 1}^{n} \sum_{g = 1}^{G} Z_{i g}^{(1)} \log f_{g} (x_{i}; \underset{\sim}{θ_{g}})

\begin{aligned} Q_{2} & = \sum_{g = 1}^{G} \sum_{i = 1}^{n} Z_{i g} \log f (y_{i}; \underset{\sim}{x_{i}}, \underset{\sim}{β_{g}}, σ_{g}^{2}) \\ = \sum_{g = 1}^{G} \sum_{i = 1}^{n} Z_{i g} \log (\frac{1}{\sqrt{2 π σ_{g}^{2}}} \exp (\frac{- (y_{i} - \underset{\sim}{x_{i}} \underset{\sim}{β_{g}})^{2}}{2 σ_{g}^{2}})) \\ = \sum_{g = 1}^{G} \sum_{i = 1}^{n} Z_{i g}^{(1)} (- \frac{1}{2} \log (2 π) - \frac{1}{2} \log σ_{g}^{2} - \frac{(y_{i} - \underset{\sim}{x_{i}} \underset{\sim}{β_{g}})^{2}}{2 σ_{g}^{2}}) \end{aligned}

Estimating $\tilde{β} :$

as ${\tilde{β}}_{g} = (β_{0 g}, β_{1 g}, \dots, β_{p g})$ for $p$ covariates and an intercept:

\frac{\partial Q_{2}}{\partial {\tilde{β}}_{g}} = {[\frac{\partial Q_{2}}{\partial β_{0 g}}, \frac{\partial Q_{2}}{\partial β_{1 g}}, \dots, \frac{\partial Q_{2}}{\partial β_{p g}}]}^{T}

Let $j = 0, 1, \dots, p :$

\begin{aligned} \frac{\partial Q_{2}}{\partial β_{j g}} & = \sum_{i = 1}^{n} Z_{i g}^{(1)} \frac{\partial}{\partial β_{j g}} (\frac{- (y_{i} - {\tilde{x}}_{i} {\tilde{β}}_{g})^{2}}{2 σ_{g}^{2}}) \\ = \frac{1}{σ_{g}^{2}} \sum_{i = 1}^{n} Z_{i g}^{(1)} (y_{i} - {\tilde{x}}_{i} {\tilde{β}}_{g}) x_{i j} = 0 \\ \to \sum_{i = 1}^{n} [Z_{i g}^{(1)} y_{i} x_{i j} - Z_{i g}^{(1)} {\tilde{x}}_{i} {\tilde{β}}_{g} x_{i j}] = 0 \end{aligned}

Then bringing it back to vector form for all $β$ 's:

\begin{aligned} \frac{\partial Q_{2}}{\partial {\tilde{β}}_{g}} & = \sum_{i = 1}^{n} [Z_{i g}^{(1)} (y_{i} - {\tilde{x}}_{i} {\tilde{β}}_{i}) {\tilde{x}}_{i}^{T}] = 0 \\ \to \sum_{i = 1}^{n} Z_{i g}^{(1)} y_{i} {\tilde{x}}_{i}^{T} - \sum_{i = 1}^{n} Z_{i g}^{(1)} {\tilde{x}}_{i} {\tilde{β}}_{g} {\tilde{x}}_{i}^{T} \\ \to \sum_{i = 1}^{n} Z_{i g}^{(1)} y_{i} {\tilde{x}}_{i}^{T} = \sum_{i = 1}^{n} Z_{i g}^{(1)} {\tilde{x}}_{i} {\tilde{β}}_{g} {\tilde{x}}_{i}^{T} \end{aligned}

Rewriting it in matrix form:

\begin{aligned} X Z_{g}^{(1)} \tilde{Y} & = (X^{T} Z_{g}^{(1)} X) {\tilde{β}}_{g} \\ {\tilde{β}}_{g}^{(1)} = \underset{\sim}{{\hat{β}}_{g}} & = (X^{T} Z_{g}^{(1)} X)^{- 1} X^{T} Z_{g}^{(1)} \tilde{Y} \end{aligned}

where $Z_{g}^{(1)} = d i a g (Z_{1 g}^{(1)}, \dots, Z_{n g}^{(1)})$

= [\begin{matrix} Z_{1 g}^{(1)} & 0 & 0 & 0 & \dots \\ 0 & Z_{2 g}^{(1 a)} & 0 & 0 & \dots \\ 0 & 0 & \dots & 0 & \dots \\ 0 & \dots & \dots & Z_{n - 1 g}^{(1)} & \dots \\ 0 & \dots & \dots & \dots & Z_{n g}^{(1)} \end{matrix}]

Estimating $σ_{g}^{2} :$

Differentiating with respect to $σ_{g}^{2}$ we get isolate $g$ and work with only one sum:

\begin{aligned} \frac{\partial Q_{2}}{\partial σ_{g}^{2}} & = \sum_{i = 1}^{n} Z_{i g}^{(1)} [(- \frac{1}{2}) \frac{1}{σ_{g}^{2}} + \frac{(y_{i} - \underset{\sim}{x_{i}} \underset{\sim}{β_{g}})^{2}}{2 (σ_{g}^{2})^{2}}] \\ = \sum_{i = 1}^{n} Z_{i g}^{(1)} [\frac{(y_{i} - \underset{\sim}{x_{i}} \underset{\sim}{β_{g}})^{2} - σ_{g}^{2}}{2 σ}] = 0 \\ \to \sum_{i = 1}^{n} Z_{i g}^{(1)} (y_{i} - {\tilde{x}}_{i} {\tilde{β}}_{g})^{2} - \sum Z_{i g}^{(1)} σ_{g}^{2} = 0 \\ \to σ_{g}^{2} = \frac{\sum_{i = 1}^{n} Z_{i g}^{(1)} (y_{i} - {\tilde{x}}_{i} {\tilde{β}}_{g})^{2}}{\sum_{i = 1}^{n} Z_{i g}^{(1)}} \end{aligned}

Plugging in the result for ${\tilde{β}}_{g}^{(1)}$ from the previous step we have:

{σ_{g}^{2}}^{(1)} = {\hat{σ}}_{g}^{2} = (Y - X {\tilde{β}}_{g}^{(1)}) Z_{g}^{(1)} (Y - X {\tilde{β}}_{g}^{(1)}) {trace}^{- 1} (Z_{g}^{(1)})

where the trace sums all $Z_{i g}$ 's back

E-step

M-step

Maximizing Q1

Maximizing Q2

Estimating β~:

Estimating σg2:

Maximizing $Q_{1}$

Maximizing $Q_{2}$

Estimating $\tilde{β} :$

Estimating $σ_{g}^{2} :$