Based upon the view of probability as a 'degree of belief', we consider the parameter to be a random variable that follows a distribution. We make an assumption about the distribution that the follows before we see the data. This distribution is called the prior distribution. We denote the prior probability distribution function of by
We update the information about the population of the parameter from the prior distribution by that in the data via the likelihood function to obtain a distribution that models the parameter better, and this distribution is called the posterior distribution
Prior distribution of , unconditional distribution of , marginal distribution of
The likelihood of given the data:
Posterior distribution of , conditional distribution of given data
Marginal distribution is called normalization constant
Given the posterior distribution of , the posterior mean or the posterior median is a typical Bayesian estimate of . The posterior mean of is
A prior distribution that leads to the posterior distribution belonging to the same distribution family is called a conjugate prior, the distribution family that both prior and posterior distributions belong to is called the conjugate family
The conjugate family has a nice property in that the posterior follows know form of distribution (?)
Note: if the prior is informative, it will have more impact on the estimator than the data
If it is hard to know the parameters of the prior distribution, we can use it flat/vague
Examples
, we assume that
Suppose , we assume that , the posterior distribution of p: