Test of k proportions

the chi-squared test of a single proportion is the square of the z-test

2 choices

H_{0} : θ_{1} = θ_{2} = \dots = θ_{k} = (θ_{0}) vs H_{a} : \exists i \neq j, θ i \neq θ_{j}

If the $n_{i}$ are sufficiently large for each population, we construct $k$ independent test statistics

E (X_{i}) = n_{i} θ_{i}, V a r (X_{i}) = n_{i} θ_{i} (1 - θ_{i})

By CLT:

\begin{aligned} Z_{i} & = \frac{x_{i} - n_{i} θ_{i}}{\sqrt{n_{i} θ_{i} (1 - θ_{i})}} \overset{approx}{\sim} N (0, 1) \\ \sum_{i = 1}^{K} Z_{i}^{2} & \sim χ_{k}^{2} = \sum {(\frac{x_{i} - n_{i} θ_{i}}{\sqrt{n_{i} θ_{i} (1 - θ_{i})}})}^{2} \end{aligned}

Reject $H_{0}$ if (fix, 130,131)

χ_{o b s}^{2} = \sum \frac{(x_{i} - n_{i} θ_{i})^{2}}{n_{i} θ_{i} (1 - θ_{i})} \geq χ_{k, 1 - α}^{2}

When $θ_{0}$ is not given (testing only if they are not the same), we need to estimate it first

\begin{aligned} {\hat{θ}}_{0} & = \frac{\sum x_{i}}{\sum n_{i}} \\ χ^{2} & = \sum \frac{(x_{i} - n_{i} θ_{i})^{2}}{n_{i} θ_{i} (1 - θ_{i})} \sim χ_{k - 1}^{2} \end{aligned}

$X_{1}, \dots, X_{k}$ observations from $k$ independent trials of size $n_{1}, \dots, n_{k}$

	successes	failures
sample 1	$x_{1}$	$n_{1} - x_{1}$
sample 2	$x_{2}$	$n_{2} - x_{2}$
$⋮$	$⋮$	$⋮$
sample $k$	$x_{k}$	$n_{k} - x_{k}$
Let $f_{i j}$ be the observed frequency in the $i$ th row and $j$ th column
Under the hypothesis that $θ_{1} = θ_{2} = \dots = θ_{k} = θ_{0}$ , the expected cell frequencies $E_{i j}$ 's are given by

E (X_{i 1}) = n_{i} θ_{0} = E_{i 1}, E (X_{i_{2}}) = n_{i} (1 - θ_{0}) = E_{i 2}

\begin{aligned} χ^{2} & = \sum_{i = 1}^{k} \sum_{j = 1}^{2} \frac{(f_{i j} - E_{i j})^{2}}{E_{i j}} easier formula \\ = \sum_{i = 1}^{k} [\frac{(x_{i} - n_{i} θ_{0})^{2}}{n_{i} θ_{0}} + \frac{(n_{i} - x_{i} - n_{i} (1 - θ_{0}))^{2}}{n_{i} (1 - θ_{0})}] \\ = \sum_{i}^{k} [\frac{(x_{i} - n_{i} θ_{0})^{2} (1 - θ_{0}) + (n_{i} - x_{i} - n_{i} + n_{i} θ_{0})^{2} θ_{0}}{n_{i} θ_{0} (1 - θ_{0})}] \\ = \sum_{i = 1}^{k} [\frac{(x_{i} - n_{i} θ_{0})^{2} (1 - θ_{0}) + (x_{i} - n_{i} θ_{0})^{2} θ_{0}}{n_{i} θ_{0} (1 - θ_{0})}] \\ χ^{2} & = \sum_{i = 1}^{k} \frac{(x_{i} - n_{i} θ_{0})^{2}}{n_{i} θ_{0} (1 - θ_{0})} actual definition \end{aligned}

when $θ_{0}$ is given or specified, $χ^{2} \sim χ_{k}^{2}$ under $H_{0}$
when $θ_{0}$ is not given, $\hat{θ} = \frac{\sum x_{i}}{\sum n_{i}}$ to compute $E_{i j}$ 's, then $χ^{2} \sim χ_{- 1}^{2}$ under $H_{0}$

With more than 2 choices

If instead of success and failure we have multiple choices, a Multinomial Distribution instead

	choice 1	choice 2	choice 3
sample 1	$x_{1}$	$y_{1}$	$z_{1}$
sample 2	$x_{2}$	$y_{2}$	$z_{2}$
$⋮$	$⋮$	$⋮$
sample $k$	$x_{k}$	$y_{k}$	$z_{k}$

$x_{i .} + y_{i .} + z_{i .} + \dots = n_{i .}$ (row total)
$x_{. j} + y_{. j} + z_{. j} + \dots = n_{j}$ (column total)

Not testing the rows

$(X_{i}, Y_{i}, Z_{i}) \sim Multinomial (n_{i}, (θ_{i 1}, θ_{i 2}, (1 - θ_{i 1} - θ_{i 2})))$ (last term not estimated)

$H_{0} :$ all $θ_{i 1}$ 's are equal, all $θ_{i_{2}}$ 's are equal, ...
$H_{a} :$ not all $θ_{i 1}$ 's are equal or not all $θ_{i_{2}}$ 's are equal or ...

Under the $H_{0} : θ_{i 1} = θ_{1}, \dots, θ_{i j - 1} = θ_{j - 1}$ , where $θ_{j}$ 's are not specified: (the last column has no freedom)

\begin{aligned} {\hat{θ}}_{j} & = \frac{\sum f_{i j}}{\sum n_{i}} \\ χ^{2} & = \sum_{i} \sum_{j} \frac{(f_{i j} - E_{i j})^{2}}{E_{i j}} \sim χ_{d f}^{2} \\ d f & = (k - 1) (c - 1) = # H_{a}^{'} s parameters - # H_{0}^{'} s parameters = i (j - 1) - (j - 1) \end{aligned}

Reject $H_{0}$ if

χ_{o b s}^{2} > χ_{d f, 1 - α}^{2}

Testing rows

$\sum_{j} θ_{i j} = 1$ , $\sum_{j} X_{i j} = n_{i .}$

For each population $i$

\begin{aligned} (X_{i 1}, X_{i 2}, X_{i 3}) & \sim Multinomial (n_{i .}, (θ_{i 1}, θ_{i 2}, 1 - θ_{i 1} - θ_{i 2})) \\ \equiv (X_{i 1}, X_{i 2}) & \sim Multinomial (n_{i .}, (θ_{i 1}, θ_{i 2}) \\ P (X_{i 1} = x_{i 1}, X_{i 2} = x_{i 2}) & = \frac{n_{i .}!}{x_{i 1}! x_{i 2}! (n_{i .} - x_{i 1} - x_{i 2})!} θ \end{aligned}

Testing if $θ_{i 1}$ 's and $θ_{i 2}$ 's are equal

\begin{aligned} H_{0} & : \forall i & θ_{i 1} = θ_{1}, θ_{i 2} = θ_{2}, \dots \\ H_{a} & : \forall i & θ_{i 1} \neq θ_{1}, or θ_{i 2} \neq θ_{2} \end{aligned}

When $θ_{j}$ 's are given, the $d f = k (c - 1)$

Reject $H_{0}$ if

χ_{o b s}^{2} = \sum_{i} \sum_{j} \frac{(f_{i j} - E_{i j})^{2}}{E_{i j}} > χ_{d f, 1 - α}^{2}

Association Test between the rows and columns if we want to determine their independency