Random variable – numerical description of the outcome of an experiment

For example:

Two types of random variables:

Discrete Random Variables

For example, tossing a fair coin 4 times. X can be a discrete random variable representing the number of heads

X can be 0, 1, 2, 3, and 4, with probabilities 1/16, 1/4, 3/8, 1/4, and 1/16 respectively.

For a probability mass function of a discrete random variable, f(x):

Cumulative Distribution Function (CDF) is like PMF, except instead of P(X = x), it is P(X <= x). So you just add the probabilities up, hence it is cumulative.

Expected value

\[ \mu = E(X) = \sum_{x} x f(x) \]

Where:

Variance

\[ \sigma^{2} = \text{var}(x) = E[ ( x - \mu )^{2} ] = E(x^{2}) - (E(x))^{2} = E(x^{2}) - \mu^{2} \]

Where \(a\) is a constant:

\[ \sigma^{2}(ax) = a^{2} \sigma^{2}(x) \]

If \(x_{1}, x_{2}, ... , x_{n}\) are independent discrete random variables:

\[ var(x_{1} + x_{2} + ... + x_{n}) = \sum_{i = 1}^{n} var(xi) \]

Standard Deviation

\[ \sigma = std(x) = \sqrt{var(x)} \]

Special Discrete Random Variables

Where \(X\) is a Bernoulli random variable:

\[ E(X) = p \]

\[ \sigma^{2}(X) = p (1-p) \]

PMF, where \(n\) is number of trials, and \(x\) is the number of successes:

\[ {}^{n}C_{x} p^{x} (1 - p)^{n - x} \]

If \(X\) is a Binomial RV:

\[ E(X) = np \]

\[ \sigma^{2} (X) = np (1 - p) \]

Geometric Distribution

Geometric RV: The number of trials until a success is observed with Bernoulli trials

X ~ G(p)

PMF

\[ g(x;p) = p^{1} (1 - p)^{x - 1} \]

Expected value

\[ E(x) = \frac{1}{p} \]

Variance

\[ \sigma^{2} = \frac{1-p}{p^{2}} \]

Memoryless Property:

\[ P( Y > a + b | Y > a ) = P (Y > b) \]

Tail Probability Rule:

\[ P(Y > a) = (1-p)^{a} \]

Negative Binomial Distribution

The number of trials until we observe \(k\) successes

X ~ NB(p, k)

\[ b*(x; p, k) = {}^{x - 1}C_{k - 1} p^{k} (1 - p)^{x - k} \]

Expected value:

\[ E(x) = \frac{k}{p} \]

Variance:

\[ \sigma^{2} = \frac{k (1-p)}{p^{2}} \]

Example

Bob is a high school basketball player. He’s a 70% free throw shooter, meaning his probability of making a free throw is 0.70. During the season, what is the probability that Bob makes his third free throw on his fifth shot?

For this question, we can use a negative binomial distribution:

This is because we want to observe \(k= 3\) successes given \(x = 5\) trials.

\[ b*(5; 0.7, 3) = {}^{5 - 1}C_{3 - 1} 0.7^{3} (1 - 0.7)^{5 - 3} \]

\[ b*(5; 0.7, 3) = 0.1852 \]

Hypergeometric Distribution

Hypergeometric Distribution:

Given these conditions, we then say that:

PMF for Hypergeometric Distribution:

\[ H(x; N, n, k) = \frac{{}^{k}C_{x} * {}^{N - k}C_{n - x}}{{}^{N}C_{n}} \]

Where:

Expected value:

\[ E(X) = \frac{nk}{N} \]

Variance:

\[ \sigma^{2} = \frac{N - n}{N - 1} * n * \frac{k}{N} * \frac{N - k}{N} \]

Example

Bag has 3 red balls and 7 white balls. Select 2 balls:

  • Sample without replacement (equivalent to taking two balls simultaneously)

Assume choosing a red ball is a success, what is the probability that the 2 selected balls are both red?

For this, we can use the hypergeometric distribution, where:

\[ H(x; N, n, k) = \frac{{}^{k}C_{x} * {}^{N - k}C_{n - x}}{{}^{N}C_{n}} \]

\[ H(2; 10, 2, 3) = \frac{{}^{3}C_{2} * {}^{10 - 3}C_{2 - 2}}{{}^{10}C_{2}} \]

\[ H(2; 10, 2, 3) = 0.0667 \]

Example

An industrial product is shipped in lots of 20. Testing to determine whether an item is defective is costly, and hence the manufacturer samples his production rather than using a 100% inspection plan. A sampling plan, constructed to minimize the number of defectives shipped to customers, calls for sampling five items from each lot and rejecting the lot if more than one defective is observed. (If the lot is rejected, each item in it is later tested.) If a lot contains four defectives, what is the probability that it will be rejected? What is the expected number of defectives in the sample of size 5?

For this question, we can use a hypergeometric distribution, where:

Notice, that \(x\) can only be 0..k, or 0..4, since we only have 4 defectives in the population. Thus, we can determine P(X > 1) = 1 - (P(X = 0) + P(X = 1)).

When X = 0

\[ H(0; 20, 5, 4) = \frac{{}^{4}C_{0} * {}^{20 - 4}C_{5 - 0}}{{}^{20}C_{5}} \]

When X = 1

\[ H(1; 20, 5, 4) = \frac{{}^{4}C_{1} * {}^{20 - 4}C_{5 - 1}}{{}^{20}C_{5}} \]

Thus, the probability for P(X > 1) is:

\[ 1 - ((\frac{{}^{4}C_{0} * {}^{20 - 4}C_{5 - 0}}{{}^{20}C_{5}}) + (\frac{{}^{4}C_{1} * {}^{20 - 4}C_{5 - 1}}{{}^{20}C_{5}})) \]

Which is 0.2487, our answer.

We can find the expected number of defectives using:

\[ E(X) = \frac{nk}{N} \]

\[ E(X) = \frac{5 * 4}{20} \]

\[ E(X) = 1 \]

So we expect there to be 1 defective for a sample size of 5.

Relation between Binomial distribution and Hypergeometric

Let us consider the Hypergeometric distriubtion:

Thus, we can consider \(p = \frac{k}{N}\) to be the probability of success in the population

So let us then take a look at the expected value of the hypergeometric distribution again:

\[ E(x) = n \frac{k}{N} = np \]

Notice that it is \(np\), just like the binomial distribution. Now let us look at the variance for the hypergeometric distribution:

\[ \sigma^{2}(x) = \frac{N - n}{N - 1} n * \frac{k}{N} * \frac{N - k}{N} = \frac{N - n}{N - 1} * n p (1 - p) \]

Besides the \(\frac{N - n}{N - 1}\) term, this looks just like the expression for the variance of a binomial distribution: \(np (1 - p)\). Consider the case where \(\lim_{N\to\infty} \frac{N - n}{N - 1}\):

\[ \lim_{N\to\infty} \frac{N - n}{N - 1} = \frac{N}{N} = 1 \]

So, what this says is that the hypergeometric distribution is just like the binomial distribution for large population sizes.

Poisson Distribution

Consider a Poisson random variable: \(X \sim \text{Poisson}( \lambda )\)

PMF for Poisson:

\[ p(x; \lambda ) = \frac{e^{- \lambda} \lambda^{x}}{x!} \]

Where:

Expected value and variance:

\[ E(x) = \sigma^{2}(x) = \lambda \]

A property of the Poisson distribution: If we are given that \(x_{1} \sim \text{Poisson}(\lambda_{1})\) and \(x_{2} \sim \text{Poisson}(\lambda_{2})\), and that they are independent, then:

\[ x_{1} + x_{2} \sim \text{Poisson}(\lambda_{1} + \lambda_{2}) \]

For p < 0.01 and n > 100 we say Poisson will approximate Binomial distribution

Continuous Random Variables

PMF of one point for continous random variables will be 0.

Note for continuous RVs:

\[ P( a \leq x \leq b ) = P( a < x < b ) \]

Expected value of a Continuous RV:

\[ \mu = E(X) = \int_{- \infty}^{- \infty} x f(x) dx \]

Note:

\[ E(c) = c \]

\[ E[ c g(x) ] = c E[g(x)] \]

where \(c\) is a constant.

\[ E[ g_{1}(x) + g_{2}(x) + ... + g_{k}(x) ] = E[g_{1}(x)] + E[g_{2}(x)] + ... + E[g_{k}(x)] \]

Variance of a continuous RV:

\[ E[x^{2}] - \mu^{2} = \int_{-\infty}^{\infty} x^{2} f(x) dx - \mu^{2} \]

Recall:

\[ \sigma^{2} (ax) = a^{2} \sigma^{2} (x) \]

If \(X_{1}\) and \(X_{2}\) are independent:

\[ var(aX_{1} + bX_{2}) = var(aX_{1}) + var(bX_{2}) = a^{2} var(X_{1}) + b^{2} var(X_{2}) \]

More generally speaking, if we have independent variables \(X_{1}, X_{2}, ..., X_{n}\), then:

\[ \sigma^{2} (X_{1} + X_{2} + ... + X_{k}) = \sum_{i = 1}^{n} \sigma^{2} (X_{i}) \]

Example:

Let \(X\) be a continuous RV with pdf

\[ f(x) = \begin{cases} \frac{1}{8} x & 0\leq x\leq 4 \\ 0 & \text{otherwise} \\ \end{cases} \]

Find the variance of X:

\[ \sigma^{2} (X) = E[X^{2}] - \mu^{2} \]

Let us first find \(\mu\):

\[ \mu = E(x) = \int_{-\infty}^{\infty} x f(x) dx \]

\[ \int_{0}^{4} x \frac{1}{8} x dx \]

\[ \frac{1}{8} \int_{0}^{4} x^{2} dx \]

\[ \frac{x^{3}}{24} \bigg|_{0}^{4} \]

\[ \frac{64}{24} = \frac{8}{3} \]

Now solve \(E[X^{2}]\):

\[ E[X^{2}] = \int_{-\infty}^{\infty} x^{2} f(x) dx \]

\[ E[X^{2}] = \int_{0}^{4} x^{2} \frac{1}{8} x dx \]

\[ E[X^{2}] = \frac{1}{8} \int_{0}^{4} x^{3} dx \]

\[ E[X^{2}] = \frac{x^{4}}{32} \bigg|_{0}^{4} \]

\[ E[X^{2}] = \frac{4^{4}}{32} = 8 \]

Now, subtract:

\[ E[X^{2}] - \mu^{2} = 8 - (\frac{8}{3})^{2} = \frac{72}{9} - \frac{64}{9} = \frac{8}{9} \]

Example 2:

The length of time to failure (in hundreds of hours) for a transistor is a random variable \(Y\) with distribution given by:

\[ F(y) = \begin{cases} 0 & y < 0 \\ 1 - e^{-y^{2}} & y \geq 0 \\ \end{cases} \]

  • Show that \(F(y)\) has the properties of a cumulative distribution function.
  • Find the 0.3 quantile of \(Y\). (The 0.3 quantile of \(Y\) is the value \(y\) such that \(P(Y \leq y) = 0.3\)).
  • Find PDF \(f(y)\).
  • Find the probability that the transistor operates for at least 200 hours.
  • Find \(P(Y > 1 | Y \leq 2)\).

Cumulative distribution function always increases until it hits 1. Knowing that the only nonzero part of the claimed CDF is \(1 - e^{-y^{2}}\) for \(y \geq 0\), we can take a limit (namely \(\lim_{y\to\infty} 1 - e^{-y^{2}}\)). As \(y\) approaches infinity in this expression, the exponential tends to \(-\infty\), so \(e^{-\infty} = \frac{1}{e^{\infty}} = \frac{1}{\infty} = 0\). Thus, \(\lim_{y\to\infty} 1 - e^{-y^{2}} = 1 - 0 = 1\). So, we know \(F(y)\) has properties of a cumulative distribution function.


Now, we will find the 0.3 quantile. Since we already have the CDF, we know for some \(y\) that \(F(y) = P(Y \leq y) = 0.3\). We can set \(1 - e^{-y^{2}} = 0.3\):

\[ 1 - e^{-y^{2}} = 0.3 \]

\[ e^{-y^{2}} = 0.7 \]

\[ -y^{2} = \ln 0.7 \]

\[ y = \sqrt{- \ln 0.7} \]

\[ y = 0.5972 \]


Now it asks us to find the PDF \(f(y)\). First, we know that when \(y < 0\), that \(f(y)\) is also 0, since nothing is accumulated in the CDF. Now, we just need to find the derivative of \(1 - e^{-y^{2}}\).

\[ \frac{d}{dy} - e^{-y^{2}} + 1 \]

\[ \frac{d}{dy} (- e^{-y^{2}}) * (-2y) \]

\[ \frac{d}{dy} (2y e^{-y^{2}}) \]


Now we need to find the probability that the transistor operates for at least 200 hours. Notice that the units is in hundreds of hours, so this is represented as:

\[ P(Y \geq 2) \]

This is the same as

\[ 1 - P(Y < 2) = 1 - F(2) \]

So:

\[ 1 - e^{-2^{2}} = 0.982 \]


Now, it asks us to find \(P(Y > 1 | Y \leq 2)\).

We know by definition:

\[ P(Y > 1 | Y \leq 2) = \frac{P(Y > 1 \cap Y \leq 2)}{P(Y \leq 2)} \]

\[ P(Y > 1 | Y \leq 2) = \frac{P(1 < Y \leq 2)}{P(Y \leq 2)} \]

We can express it like this so we can use CDF to make this easy:

\[ P(Y > 1 | Y \leq 2) = \frac{P(Y \leq 2) - P(Y < 1)}{P(Y \leq 2)} \]

\[ P(Y > 1 | Y \leq 2) = \frac{F(2) - F(1)}{F(2)} \]

\[ P(Y > 1 | Y \leq 2) = 1 - \frac{F(1)}{F(2)} \]

\[ P(Y > 1 | Y \leq 2) = 1 - \frac{1 - e^{-1^{2}}}{1 - e^{-2^{2}}} \]

\[ P(Y > 1 | Y \leq 2) = 1 - \frac{1 - e^{-1}}{1 - e^{-4}} \]

\[ P(Y > 1 | Y \leq 2) = 1 - \frac{e^{-1} - 1}{e^{-4} - 1} \]

\[ P(Y > 1 | Y \leq 2) = 1 - \frac{e - 1}{e - e^{-3}} \]

Special Continuous Random Variables

Uniform Random Variable

A uniform random variable means every outcome in an interval is equally likely.

Consider \(X \sim U[a,b]\). pdf (probabily density function) is given by:

\[ f(x) = \frac{1}{b - a} \]

CDF (cumulative density function) is given by:

\[ \frac{x - a}{b - a} \]

Expectation (can prove using definition)

\[ \frac{a + b}{2} \]

Variance (can prove using definition)

\[ \sigma^{2} = \frac{(b - a)^{2}}{12} \]

Uniform RV example

Suppose you are going out for the evening with friends and they ask you to arrive by 9:00pm. Denote \(T\) as the length past 9:00pm when your friends arrive, and assume \(T\) is uniformly distributed between 0 and 30 mins.

  • State the distribution and parameters of \(T\)
  • What is the probability that you’ll have to wait more than 20 minutes for your friends
  • If at 9:20 pm your friends have not yet arrived, what is the probability that you have to wait at least 5 more minutes?
  • What is the probability that your friends will arrive exactly at 9:25pm?
  • Use the formulas from lecture to find the mean and variance for the amount of time that you will wait.

First, \(T \sim U[0, 30]\).

Then:

\[ P(T > 20) = \int_{20}^{30} \frac{1}{30} dt = \frac{t}{30} \bigg|_{20}^{30} = \frac{1}{3} \]

The next question is a conditional probability question:

\[ P( T \geq 25 | T > 20 ) = \frac{P(T \geq 25 \cap T > 20 )}{P( T > 20 )} = \frac{P(T \geq 25)}{P(T > 20)} = \frac{1}{2} \]

Probability of an exact point (i.e. 9:25pm) is 0.

Expectation is \(\frac{a + b}{2} = \frac{30}{2} = 15\) and variance is \(\sigma^{2} = \frac{(b - a)^{2}}{12} = \frac{30^{2}}{12} = 75\text{ mins}^{2}\).

Normal (Gaussian) Distribution

Consider \(X \sim N(\mu, \sigma^{2})\), where \(\mu\) is the mean and \(\sigma^{2}\) is the variance.

The pdf is given by

\[ f(x) = \frac{1}{\sqrt{2\pi} \sigma} e^{-\frac{(x - \mu)^{2}}{2 \sigma^{2}}} \]

for \(- \infty < x < \infty\).

\(\mu\) determines center, \(\sigma\) determines “spread”.

CDF requires like advanced calculus so we have some epic tables instead.

The tables work for \(\sigma = 1\) and \(\mu = 0\), the standard normal distribution. They tell you the probability of \(P(X \leq z)\), which is like our CDF.

Since these tables are for the standard normal distribution, we should find a way to address the general case. We can do this by using:

\[ z = \frac{x - \mu}{\sigma} \]

Basically, this gives you the \(z\) value which applies to the standard normal distribution as a function of the \(x\) value from the general one and its respective \(\mu\) and \(\sigma\) values.

Critical Value: \(z_{\alpha}\) of a standard normal distriution is the value on the measurement axis for which \(\alpha\) of the area under the normal curve lies to the right of \(z_{\alpha}\)

Normal Approximation to the Binomial

Suppose \(X \sim B(n, p)\) (binomial dist). We know:

  • \(\mu = np\)
  • \(\sigma^{2} = np (1 - p)\)

We say that when \(n \to \infty\), then:

\[ z = \frac{x - \mu}{\sigma} = \frac{x - np}{\sqrt{np (1 - p)}} \sim N(0, 1) \]

In other words, for large values of \(n\), we can approximate binomial as standard normal.

\[ P(X \leq x) = \sum_{k = 0}^{x} b(k;n,p) = P(z \leq \frac{x + 0.5 - np}{\sqrt{np (1 - p)}}) \]

Gamma Distribution

A Gamma RV is nonnegative, the pdf skews to the right.

e.g. time until the next customer arrives at a grocery store

Notation: \(X \sim \text{Gamma}(\alpha , \beta )\), \(\alpha\) is the shape parameter, and \(\beta\) is the scale parameter.

PDF:

\[ f(x) \int \frac{x^{\alpha - 1} e^{-\frac{x}{\beta}}}{\beta^{2} \Gamma( \alpha )} \]

For \(0 \leq x < \infty\).

Where:

\[ \Gamma( \alpha ) = \int_{0}^{\infty} x^{\alpha - 1} e^{-x} dx \]

If \(n\) is an integer, \(\Gamma(n) = (n - 1)!\).

No closed form for the CDF.

Expectation:

\[ \mu = E(x) = \alpha \beta \]

Variance:

\[ \sigma^{2} = \text{Var}(x) = \alpha * \beta^{2} \]

Note:

\[ T( \alpha + 1 ) = \alpha \Gamma ( \alpha ) \]

Chi Square Distribution

\(Y\) is a Gamma random variable with \(\beta = 2\) and \(2 \alpha\) is a positive integer.

\(Y\) is a chi-square RV with degrees of freedom \(V = 2 \alpha\).

\(Y \sim X^{2}(V)\)

\(V\) is the degree of freedom.

PDF

\[ f(y) = \int \frac{y^{\frac{V}{2} - 1} e^{- \frac{y}{2}}}{2^{\frac{V}{2}} \Gamma(\frac{V}{2})} \]

For \(0 \leq y < \infty\), 0 elsewhere.

Expectation

\[ \mu = E(Y) = V \]

\[ \sigma^{2} = \text{Var}(Y) = 2V \]

Exponential Distribution

PDF

\[ f(y) = \begin{cases} \frac{1}{\beta} e^{- \frac{y}{\beta}} & y \geq 0 \\ 0 & \text{elsewhere} \\ \end{cases} \]

We can find a closed form of the CDF.

\[ F(y) = P(Y \leq y) = 1 - e^{- \frac{y}{\beta}}, y \geq 0 \]

Expectation:

\[ \mu = E(Y) = \beta \]

Variance:

\[ \sigma^{2} = \text{Var}(Y) = \beta^{2} \]

May also see it written in terms of \(\theta\), where \(\theta = \frac{1}{\beta}\).

Exponential dist also has the memoryless property: the future is independent of the past. The time you have waited doesn’t hold any weight or bearing as to your waiting time afterwards.