Gaussian Finite Mixture Model

Finite mixture models consider the situation that a sample is drawn from a population that consists of a finite number of components/subpopulations

WLOG, we adopt the common form of the pmf for a G-component mixture as

L(Ψ)=i=1nf(xi;Ψ)=i=1ng=1Gπgfg(xi;θg),

for i=1n,Ψ=(π1,π2,πG1,θ1,,θG)

57

If we use the conventional Method of maximum likelihood we will not obtain an explicit solution for the parameters

lπg=i=1nf(xi;θg)fG(xi;θG)gπgf(xi;θg)=0=f(x1;θg)f(x1;θ)

To use the Expectation-Maximization Algorithm we twist this problem as an incomplete-data problem. The observed data is X and the unobserved data is the component membership of each observation

z=(z1,,zn)yi=(xi,zi)

The complete-data log-likelihood for Ψ has the form

Zig={1,if the ith obs is from gth component0otherwise

Zi follows a multinomial(1,Pi), prior, posterior something

Now the complete-data likelihood for Ψ has the form

Lc=i=1nf(yi;Ψ)=i=1nf(xi,zi;Ψ)=Lc(Ψ;y)=i=1ng=1G[πgfg(xi;θg)]zig

Now we can formulate a general E-step and M-step for fitting a mixture model

lc(Ψ;y)=i=1ng=1GZiglog[πgfg(xi;θg)]=i=1ng=1GZig[logπg+logfg(xi;θg)]

E-step

Given Ψ(i)=(π(i),θ(i)), estimate Zig(i+1) using Bayes Theorem

Zig(1)=E(Zig|π(0),θ(0))=P(Zig=1xi,Ψ)=P(xiZig=1,Ψ)P(Zig=1Ψ)P(xiΨ)Zig(1)=πgfg(xi;θg)h=1Gπhfh(xi;θh)

Which is necessary in the Q formula:

Q(Ψ;Ψ(0))=E(lc(Ψ;y)|Psi(0))=E(i=1ng=1GZig[logπg+logfg(xi;θg)]|π(0),θ(0))=i=1ng=1GE(Zig[logπg+logfg(xi;θg)]|π(0),θ(0))=i=1ng=1GE(Zig|π(0),θ(0))[logπg+logfg(xi;θg)]Zig(1)=E(Zig|π(0),θ(0))=πg(0)f(xi;θg(0))g=1Gπg(0)f(xi;θg(0))

M-step

find Ψ=(π,θ) that maximizes Q(Ψ;Ψ(i)) based on Zig(i)

Q(Ψ;Ψ(0))=i=1ng=1GZig(1)[logπg+logfg(xi;θg)]=i=1ng=1GZig(1)logπg+i=1ng=1GZig(1)logfg(xi;θg)=Q1(π;π(0))+Q2(θ;θ(0))

Maximizing Q1

Q1=i=1ng=1GZig(i)logπg+λ(1g=1Gπg)

with a scalar λ to find a solution, as gπg=1,1gπg=0

Then maximizing Q1 we get the values of πg based on Zig(i):

Q1πg=i=1nZig(i)λ=0πg=i=1nZig(i)λ

λ is also an unknown variable, so to find the value of it that maximizes Q1:

Q1λ=1g=1Gπg=01g=1Gi=1nZigλ=1i=1ng=1GZigλ=1i=1n1λ=1nλ

λ^=n and

π^g=i=1nZig(i)n,g=1,,G

Maximizing Q2

Q2=i=1ng=1GZig(1)logfg(xi;θg)Q2=g=1Gi=1nZiglogf(yi;xi,βg,σg2)=g=1Gi=1nZiglog(12πσg2exp((yixiβg)22σg2))=g=1Gi=1nZig(1)(12log(2π)12logσg2(yixiβg)22σg2)

Estimating β~:

as β~g=(β0g,β1g,,βpg) for p covariates and an intercept:

Q2β~g=[Q2β0g,Q2β1g,,Q2βpg]T

Let j=0,1,,p:

Q2βjg=i=1nZig(1)βjg((yix~iβ~g)22σg2)=1σg2i=1nZig(1)(yix~iβ~g)xij=0i=1n[Zig(1)yixijZig(1)x~iβ~gxij]=0

Then bringing it back to vector form for all β's:

Q2β~g=i=1n[Zig(1)(yix~iβ~i)x~iT]=0i=1nZig(1)yix~iTi=1nZig(1)x~iβ~gx~iTi=1nZig(1)yix~iT=i=1nZig(1)x~iβ~gx~iT

Rewriting it in matrix form:

XZg(1)Y~=(XTZg(1)X)β~gβ~g(1)=β^g=(XTZg(1)X)1XTZg(1)Y~

where Zg(1)=diag(Z1g(1),,Zng(1))

=[Z1g(1)0000Z2g(1a)000000Zn1g(1)0Zng(1)]

Estimating σg2:

Differentiating with respect to σg2 we get isolate g and work with only one sum:

Q2σg2=i=1nZig(1)[(12)1σg2+(yixiβg)22(σg2)2]=i=1nZig(1)[(yixiβg)2σg22σ]=0i=1nZig(1)(yix~iβ~g)2Zig(1)σg2=0σg2=i=1nZig(1)(yix~iβ~g)2i=1nZig(1)

Plugging in the result for β~g(1) from the previous step we have:

σg2(1)=σ^g2=(YXβ~g(1))Zg(1)(YXβ~g(1))trace1(Zg(1))

where the trace sums all Zig's back