Where \(X_{1}, X_{2}, \dots , X_{n}\) represent \(n\) random random variables:
Sample mean
\[ \bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \]
Sample variance:
\[ s^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} ( X_{i} - \bar{X} )^{2} = \frac{1}{n(n - 1)} [n \sum_{i = 1}^{n} X_{i}^{2} - (\sum_{i = 1}^{n} X_{i})^{2}] \]
\(\bar{X}\) is also a normal random variable with mean and variance (\(\bar{X} \sim \mathcal{N}( \mu , \frac{\sigma^{2}}{n} )\)), given that \(X_{1}, X_{2}, \dots , X_{n} \sim \mathcal{N}(\mu , \sigma^{2})\).
If \(\bar{X}\) is a mean of a random sample of size \(n\) from a population with mean \(\mu\) and finite variance \(\sigma^{2}\), then limit of the limiting form of the distribution of
\[ Z = \frac{\bar{X} - \mu }{\sigma / \sqrt{n}} \]
as \(n \to \infty\), is the standard normal distribution, i.e., \(Z \sim \mathcal{N}(0, 1)\).
When a batch of a certain chemical product is prepared, the amount of a particular impurity in the batch is a random variable with mean 4.0 g, and standard deviation 1.5 g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity \(\bar{X}\) is between 3.5 and 3.8 g?
We know:
We know that \(n = 50 > 30\), so we can apply CLT to this given the large enough sample size.
CLT tells us
Recall that \(\bar{X}\) is also a normal random variable, of which we know the mean (\(\mu_{\bar{X}}\)) and standard deviation (\(\sigma_{\bar{X}}\)) of.
So, we want to find:
\[ P(3.5 \leq \bar{X} \leq 3.8) \]
We can convert this into the standard normal distribution using \(z = \frac{x - \mu}{\sigma}\):
\[ P(\frac{3.5 - 4}{0.2121} \leq z \leq \frac{3.8 - 4}{0.2121}) = P(-2.36 \leq z \leq -0.94) = P(z \leq -0.94) - P(z \leq -2.36) \]
We can use the table to find this, which comes out to be \(0.1645\), our answer.
Suppose we have two independent samples of size \(n_{1}\) and \(n_{2}\). \(n_{1}\) has mean \(\mu_{1}\) and variance \(\sigma^{2}_{1}\). \(n_{2}\) has mean \(\mu_{2}\) and variance \(\sigma^{2}_{2}\). The sampling distribution of the differences of means, \(\bar{X_{1}} - \bar{X_{2}}\) is approximately normally distributed with mean and variance given by:
\[ \mu_{\bar{X_{1}} - \bar{X_{2}}} = \mu_{1} - \mu_{2} \]
\[ \sigma^{2}_{\bar{X_{1}} - \bar{X_{2}}} = \frac{\sigma^{2}_{1}}{n_{1}} + \frac{\sigma^{2}_{2}}{n_{2}} \]
Hence,
\[ Z = \frac{(\bar{X_{1}} - \bar{X_{2}}) - (\mu_{1} - \mu_{2})}{\sqrt{\sigma^{2}_{1} / n_{1} + \sigma^{2}_{2} / n_{2}}} \sim \mathcal{N}(0, 1) \]
If both \(n_{1}\) and \(n_{2}\) are greater than or equal to 30, the normal approximation for the distribution of \(\bar{X_{1}} - \bar{X_{2}}\) is very good when the distributions are not too far away from normal.
For Chi Squared distribution:
Consider \(X_{1}, X_{2}, \dots , X_{n}\) i.i.d from a normal distribution.
\(s^{2}\) is a good estimate of \(\sigma^{2}\).
\[ z \sim \mathcal{N}(0, 1) \]
\[ V \sim X^{2} (v) \]
\(z\) and \(V\) are independent
This is a t-distribution with degrees of freedom \(v\):
\[ T = \frac{z}{\sqrt{\frac{V}{v}}} \sim t(v) \]
F-distribution has two independent chi-squared random variables \(X_{1}\) and \(X_{2}\) with respective degrees of freedom \(v_{1}\) and \(v_{2}\).
The F distribution is then defined to be the distribution of this ratio:
\[ F = \frac{X_{1} / v_{1}}{X_{2} / v_{2}} \]
Complement expression for the f-Distribution
\[ F_{1 - \alpha}(v_{1}, v_{2}) = \frac{1}{F_{\alpha}(v_{2}, v_{1})} \]
Consider two independent random samples of \(m\) and \(n\) observations from the normal populations \(\mathcal{N}(\mu_{1}, \sigma_{1}^{2})\) and \(\mathcal{N}(\mu_{2}, \sigma_{2}^{2})\) respectively, then:
\[ F(m - 1, n - 1) = \frac{S_{1}^{2}/\sigma_{1}^{2}}{S_{2}^{2}/\sigma_{2}^{2}} \]
follows an F distribution. Note that \(v_{1} = m - 1\) and \(v_{2} = n - 1\).