Minorization-Maximization Algorithm

Surrogate function

M(θm|θm)=f(θm)

then find θm+1=argmaxM(θ) such that m(θ|θm+1)=f(θm+1)

Beta binomial Regression Example

YiBinomial(mi,pi)

where Yi is a count variable

E(Yi)=mipi,Var(Yi)=mipi(1pi)

and pi is assumed to follow a Beta Distribution (αi1,αi2)

E(pi)=αi1αi1+αi2=αi1αi+Var(pi)=αi1αi2αi+2(1+αi+)2

Var(pi) dependds on the size of αi1 and αi2

Beta Binomial distribution of Y and pi

fBB(yi;αi)=pifBin(yi;pi)fBeta(pi;αi)dx=mi!yi!(miyi)!Γ(αi+)Γ(αi1)Γ(αi2)Γ(yi+αi1)Γ(miYi+αi2)Γ(mi+αi+)

The mean and variance of the Yi with the BB distribution are then given by

E(Yi)=E(E(Yi|pi))=E(mipi)=miαi1αi+Var(Yi)=Var(E(Yi|pi))+E(Var(Yi|Pi))=Var(mipi)+E(mipi(1pi))=mi2αi1αi2αi+2(1+αi+2)+E(mipi)E(mipi2)=miαi1αi+(1αi1αi+)mi+αi11+αi+L(β;xi)=fBB(yi;αi)

yi1 is the number of successes on the sample mi

l(β)=logfBB(yi;β)=i=1n[l=0yi11log(exiβ+l)+l=0miyi1log(e)]

the negative log is not concave

In the application of Jensen's inequality, for a concave function ϕ() with contants a's

a1=exp(xi)

Surrogate of the first function f1

log(exiβi+l)=log(a1v1+a2v2)a1logv1+a2logv2=exp(xiβ)

Surrogate of the second function

f2(β)=i=1nl=0mi1log(exiβi+exiβ2+l)

because log() is concave, log() is convex, choose ϕ()=log()

log(v)=log(exiβi+exiβ2+l)log(v(t))+d(log(v))dv|v=v(t)(vv(t))=log(v(t))1v(t)(vv(t))=log(exiβi+exiβ2+l)exiβi+exiβ2exiβi(t)exiβ2(t)exiβi(t)+exiβ2(t)+lf2(β)g2(β|β(t))=α=12i=1nl=0mi1exp(xiβd)

In combination we have a surrogate function g(β) for the objective f(β) as

f(β)=f1(β)+f2(β)g(β|β(t))=g1(β|β(t))+g2(β|β(t))=α=12i=1n(l=0yid1exp(xiβd(t))exp(xiβd(t))+lxiβd+l=0mi11d=12exp(xiβd(t))+lexp(xiβd))+C(t)=[α=12i=1nWi(t)(yid(t)xiβdexp(xiβd))]+c(t)