Random variables, say \(X\), are variables whose possible values (state space) are drawn from some random phenomenon.
There are two types of random variables:
Mostly, discrete random variables will be used in this course.
Consider some column vector \(D\) from a random variable \(X\).
We can assume that the observed data is a random sample drawn from \(X\), each \(x_{i}\) is iid (independently and identically distributed)
In general, the distribution from which \(X\) is drawn, as well as its moments. We have the sample, and from the sample we can derive its distribution and moments. Hopefully they are close to the population distribution and moments.
Suppose we have a discrete variable \(X\) that can take the values 1, 2, 3, 4 with the following probabilities:
Then the probability mass function can be described by the following histogram (which is essentially the probability mass function):
And this equation:
\[ \hat{f}(x) = P(X = x) = \frac{1}{n} \sum_{i=1}^{n} I(x_{i} = x) \]
Where \(I\) is the identity function:
\[ I(x_{i} = x) = \begin{cases} 1 & \text{if } x_{i} = x \\ 0 & \text{if } x_{i} \neq x \\ \end{cases} \]
In other words, the \(\hat{f}(x)\) function maps all elements in the state space of \(X\) to a 1 or 0, depending on whether the element is equal to \(x\). It then takes the sum, which is just the number of elements that are equal to \(x\). This is similar to how we can sum logical values in R
values <- c(1, 1, 2, 2, 2, 3, 3, 3, 3, 4)
# There are 2 1s in the values vector
sum(values == 1)
## [1] 2
\(\hat{f}(1) = 2\), since the function basically just counts how many 1s there are in the state space.
A probability density function (PDF) is similar to a probability mass function (PMF), but it is for continuous random variables.
A density plot visualizes the underlying probability distribution of some data by drawing a continuous curve.
Mean (sample):
\[ \hat{\mu} = \frac{1}{n - 1} \sum_{i = 1}^{n} x_{i} \]
Is the mean robust/stable? Robustness meaning the measure is not affected by extreme values.
Expectation of an r.v., what’s the difference between the expected value and the mean?
# Suppose we are rolling a die
x <- c(1, 2, 3, 4, 5, 6)
# The probability of rolling any of them is 1/6
probabilities <- rep(1/6, each = 6)
# What is the average value of the die in the long run? (This is the expected value)
expected_value <- 0
for (i in 1:length(x)) {
expected_value <- expected_value + (x[i] * probabilities[i])
}
print(paste("Expected value:", expected_value))
## [1] "Expected value: 3.5"
# Now suppose we roll the die just 10 times, what is the average? (This is the mean)
rolls <- c(5, 2, 6, 2, 2, 1, 2, 3, 6, 1)
print(paste("Mean: ", mean(rolls)))
## [1] "Mean: 3"
Notice that the mean does not equal the expected value, the mean is some average of a number of observations. The expected value is the average in the long run.
Properties of expectation:
Median (sample):
\[ P(X \leq m) \geq \frac{1}{2} \text{ and } P(x \geq m) \geq \frac{1}{2} \]
Is the median stable? Yes, it isn’t affected by extreme values, and it’s an actual value that the random variable takes.
Mode is the value at which the PMF attains its maximum value (which value appears the most).
Not really a useful measure of central tendency, since it doesn’t really tell you about the center.
Measures of dispersion includes variance and standard deviation:
Maximal deviation and Mean absolute deviation
\[ mad(X) \leq stddev(X) \leq maxdev(X) \]
If we have two random variables \(X_{1}\) and \(X_{2}\), how do we find the measures of central tendency for them?
Geometrically, we can view the random variables in n-D space as column vectors.
The first and second moments (mean and variance respectively) can be computed in the same manner, but a vector is returned.
Measure of association: Covariance
Covariance is the measure of association or linear dependence between two variables \(X_{1}\) and \(X_{2}\):
\[ cov(X_{1}, X_{2}) = \hat{\sigma}_{12} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i1} - \hat{\mu_{1}}) * (x_{i2} - \hat{\mu_{2}}) \]
A covariance matrix is one whose dimensions are n by n, where \(n\) is the number of random variables. A value at row \(i\) and column \(j\) represents the covariance between random variables at column \(i\) and at column \(j\). This means across the diagonal of the covariance matrix is just the variance of the particular random variable, as \(i = j\) across the diagonal.
The variance of a covariance matrix is understood to be the sum of the diagonal (or the sum of the variances of the random variables). This is also called a trace, as well as the total variance.
Releated to covariance is correlation.
Correlation between two variables, \(X_{1}\) and \(X_{2}\) is the standardized covariance obtained by normalizing the covariance with the standard deviation of each variable.
Why do we have both covariance and correlation then?
And of course, correlation != causation.
Also known as normal distributions
# Sample of 100 normally distributed numbers
data <- rnorm(100)
hist(data, main = "Normal Distribution")
Binomal distributions are parameterized by the number of trials \(n\) and probability of success in each trial \(p\)
Expresses the probability of a given number of events \(k\) occurring in a fixed interval of time if the events occur with a known constant mean rate \(\lambda\) and independently of the time since the last event (memoryless property).
Relationships where one quantity varies as a power of another (Example: area of square quadruples where length is doubled).
a <- rbinom(100, 50, 0.5)
b <- rnorm(100, 3.0, 1.0)
plot(a, b, xlab="r.v. from binomial dist.", ylab = "r.v. from normal dist.", main = "Graph!")
Boxplots: These tell us 5-number summaries (min, first quartile, median, third quartile, max).
a <- rbinom(100, 50, 0.5)
boxplot(a)
a <- c(2, 7, 8, 9, 10, 15, 16, 20)
plot(ecdf(a), verticals = T, do.points = F)