Penalized Likelihood

When a large number of variables/predictors are avaialable for predicting the response, the conventional MLE/Least Square Error

Used when we want to penlize the model complexity to encourage a more simpler model

lp(θ;x)=l(θ;x)λP(θ)

Ridge Regression

Yi=β0+β1xi1β^=(XTX)1XTYvar(B^)=var(XTX)1XTYvar(Y)=σ2=var(y^)=var(Xβ^)=var(X(XTX)1XTY)=Xvar(β^)XT=σ2(X(XTX)1XT)

The ordinary least square estimate β^0 of β is the solution that minimizes the residual sum of aquare

=argmin{i=1nϵi}β^ridge=argminβ{×x+λβj2}==(XTX+λI)1XTy

Minimizing the terms of error + square betas, we minimize the effect of relying on specific betas to explain y (overfitting). The act of minimizing β^ridge approximates all the betas with its dimensions to a circle/ x-d spheres

lp(β)βj|β=βλ{λmin,,λmax}λmax=min λ s.t. all βj=0,j0

show Var(β^ridge)var(β^undsim)

Least Absolute Shrinkage and Selection Operation (LASSO)