Sampling Distribution

Where \(X_{1}, X_{2}, \dots , X_{n}\) represent \(n\) random random variables:

Sample mean

\[ \bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \]

Sample variance:

\[ s^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} ( X_{i} - \bar{X} )^{2} = \frac{1}{n(n - 1)} [n \sum_{i = 1}^{n} X_{i}^{2} - (\sum_{i = 1}^{n} X_{i})^{2}] \]

Sampling Distribution of Sample Means

\(\bar{X}\) is also a normal random variable with mean and variance (\(\bar{X} \sim \mathcal{N}( \mu , \frac{\sigma^{2}}{n} )\)), given that \(X_{1}, X_{2}, \dots , X_{n} \sim \mathcal{N}(\mu , \sigma^{2})\).

Central Limit Theorem

If \(\bar{X}\) is a mean of a random sample of size \(n\) from a population with mean \(\mu\) and finite variance \(\sigma^{2}\), then limit of the limiting form of the distribution of

\[ Z = \frac{\bar{X} - \mu }{\sigma / \sqrt{n}} \]

as \(n \to \infty\), is the standard normal distribution, i.e., \(Z \sim \mathcal{N}(0, 1)\).

Example

When a batch of a certain chemical product is prepared, the amount of a particular impurity in the batch is a random variable with mean 4.0 g, and standard deviation 1.5 g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity \(\bar{X}\) is between 3.5 and 3.8 g?

We know:

We know that \(n = 50 > 30\), so we can apply CLT to this given the large enough sample size.

CLT tells us

Recall that \(\bar{X}\) is also a normal random variable, of which we know the mean (\(\mu_{\bar{X}}\)) and standard deviation (\(\sigma_{\bar{X}}\)) of.

So, we want to find:

\[ P(3.5 \leq \bar{X} \leq 3.8) \]

We can convert this into the standard normal distribution using \(z = \frac{x - \mu}{\sigma}\):

\[ P(\frac{3.5 - 4}{0.2121} \leq z \leq \frac{3.8 - 4}{0.2121}) = P(-2.36 \leq z \leq -0.94) = P(z \leq -0.94) - P(z \leq -2.36) \]

We can use the table to find this, which comes out to be \(0.1645\), our answer.

Sampling Distribution of the Difference between Two Means

Suppose we have two independent samples of size \(n_{1}\) and \(n_{2}\). \(n_{1}\) has mean \(\mu_{1}\) and variance \(\sigma^{2}_{1}\). \(n_{2}\) has mean \(\mu_{2}\) and variance \(\sigma^{2}_{2}\). The sampling distribution of the differences of means, \(\bar{X_{1}} - \bar{X_{2}}\) is approximately normally distributed with mean and variance given by:

\[ \mu_{\bar{X_{1}} - \bar{X_{2}}} = \mu_{1} - \mu_{2} \]

\[ \sigma^{2}_{\bar{X_{1}} - \bar{X_{2}}} = \frac{\sigma^{2}_{1}}{n_{1}} + \frac{\sigma^{2}_{2}}{n_{2}} \]

Hence,

\[ Z = \frac{(\bar{X_{1}} - \bar{X_{2}}) - (\mu_{1} - \mu_{2})}{\sqrt{\sigma^{2}_{1} / n_{1} + \sigma^{2}_{2} / n_{2}}} \sim \mathcal{N}(0, 1) \]

If both \(n_{1}\) and \(n_{2}\) are greater than or equal to 30, the normal approximation for the distribution of \(\bar{X_{1}} - \bar{X_{2}}\) is very good when the distributions are not too far away from normal.

Distribution of Sample Variance

For Chi Squared distribution:

Consider \(X_{1}, X_{2}, \dots , X_{n}\) i.i.d from a normal distribution.

  1. \(\bar{X}\) and \(s^{2}\) are independent
  2. \(\frac{(n - 1) s^{2}}{\sigma^{2}} \sim X^{2} (n - 1)\)
  3. \(\mu_{s^{2}} = \sigma^{2}\), \(\sigma^{2}_{s^{2}} = \frac{2 \sigma^{4}}{n - 1}\)

\(s^{2}\) is a good estimate of \(\sigma^{2}\).

t-Distribution

\[ z \sim \mathcal{N}(0, 1) \]

\[ V \sim X^{2} (v) \]

\(z\) and \(V\) are independent

This is a t-distribution with degrees of freedom \(v\):

\[ T = \frac{z}{\sqrt{\frac{V}{v}}} \sim t(v) \]

f-Distribution

F-distribution has two independent chi-squared random variables \(X_{1}\) and \(X_{2}\) with respective degrees of freedom \(v_{1}\) and \(v_{2}\).

The F distribution is then defined to be the distribution of this ratio:

\[ F = \frac{X_{1} / v_{1}}{X_{2} / v_{2}} \]

Complement expression for the f-Distribution

\[ F_{1 - \alpha}(v_{1}, v_{2}) = \frac{1}{F_{\alpha}(v_{2}, v_{1})} \]

Consider two independent random samples of \(m\) and \(n\) observations from the normal populations \(\mathcal{N}(\mu_{1}, \sigma_{1}^{2})\) and \(\mathcal{N}(\mu_{2}, \sigma_{2}^{2})\) respectively, then:

\[ F(m - 1, n - 1) = \frac{S_{1}^{2}/\sigma_{1}^{2}}{S_{2}^{2}/\sigma_{2}^{2}} \]

follows an F distribution. Note that \(v_{1} = m - 1\) and \(v_{2} = n - 1\).