Bayesian estimation

Based upon the view of probability as a 'degree of belief', we consider the parameter θ to be a random variable that follows a distribution. We make an assumption about the distribution that the θ follows before we see the data. This distribution is called the prior distribution. We denote the prior probability distribution function of θ by g(θ)

We update the information about the population of the parameter from the prior distribution by that in the data via the likelihood function to obtain a distribution that models the parameter better, and this distribution is called the posterior distribution

Prior distribution of θ: θg(θ), unconditional distribution of θ, marginal distribution of θ

The likelihood of θ given the data:

x1,,xnf(x;θ)L(θ)=f(xi;θ)=f(x1,,xn;θ)

Posterior distribution of θ: h(θ|x~), conditional distribution of θ given data

h(θ|x~)=f(x1,,xn,θ)f(x1,,xn),θ as a random variable, not given =joint distribution of data and θmarginal distribution of data, independent of θ

Marginal distribution is called normalization constant

h(θ|x)dθ=f(x,θ)f(x)dθ=1f(x)f(x,θ)dθ=f(x)f(x)=1

Given the posterior distribution of θ, the posterior mean or the posterior median is a typical Bayesian estimate of θ. The posterior mean of θ is

θ^B=E(θ|x1,,xn)=θh(θ|x)dθ

A prior distribution that leads to the posterior distribution belonging to the same distribution family is called a conjugate prior, the distribution family that both prior and posterior distributions belong to is called the conjugate family

The conjugate family has a nice property in that the posterior follows know form of distribution (?)

Note: if the prior is informative, it will have more impact on the estimator than the data

If it is hard to know the parameters of the prior distribution, we can use it flat/vague

Examples

XUni(0,θ), we assume that θGamma(2,1)

(prior dist’n) g(θ)=1Γ(2)12θ21eθ/1=θeθ,θ>0(likelihood) L(θ)=1θ,0<x<θ(posterior dist’n) h(θ|X)=f(x,θ)f(x)=f(x;θ)g(θ)f(x)=L(θ)g(θ)f(x)L(θ)g(θ)=1θθeθ=eθf(x)=xeθdθ=ex,0<x<θh(θ|x)=L(θ)g(θ)f(x)=eθex=exθθ^B=E(θ|x)=xθh(θ|x)dθ=xθexθdθ=exxθeθdθ=ex[θ(eθ)|xxeθdθ]=ex[xexeθ|x]θ^B=x+1

Suppose XBinomial(n,p), we assume that pbeta(α,β), the posterior distribution of p:

g(p)=Γ(α+β)Γ(α)Γ(β)pα1(1p)β1L(p)=f(x;p)=(nx)px(1p)nxf(x,p)=L(p)g(p)=(nx)px(1p)nxΓ(α+β)Γ(α)Γ(β)pα1(1p)β1=(nx)Γ(α+β)Γ(α)Γ(β)px+α1(1p)nx+β1f(x)=01f(x,p)dp=(nx)Γ(α+β)Γ(α)Γ(β)Γ(x+α)Γ(nx+β)Γ(x+α+nx+β)01Γ(x+α+nx+β)Γ(x+α)Γ(nx+β)px+α1(1p)nx+β1=(nx)Γ(α+β)Γ(α)Γ(β)Γ(x+α)Γ(nx+β)Γ(x+α+nx+β)1 Beta distributionh(θ|x)=f(x,θ)f(x)=Γ(x+α+nx+β)Γ(x+α)Γ(nx+β)pxα+1(1p)nx+β=Γ(α+β+n)Γ(x+α)Γ(nx+β)pxα+1(1p)nx+β (posterior belongs to the same family)p^B=E(p|x)=01ph(p|x)dp=x+αx+α+nx+β=x+αn+α+β=xn+α+β+αn+α+β=xnnn+α+β+αα+βα+βn+α+β=xn(p^)w+αα+β(E(p))(1w)=page 75