Expectation and Variance — Section 1: Probability Foundations

The expected value $E[X]$ is the long-run average of a random variable. The variance $\text{Var}(X)$ measures how much it fluctuates around that average. Almost every statistic and learning algorithm reduces to manipulating these.

Expectation

For a discrete RV: $E[X] = \sum_x x \cdot P(X = x)$ . For continuous: $E[X] = \int x \cdot f(x) \, dx$ .

Key property — linearity of expectation: $E[aX + bY] = aE[X] + bE[Y]$ , even when $X$ and $Y$ are dependent. This is the single most useful identity in applied probability.

Variance

$\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$ . Standard deviation $\sigma = \sqrt{\text{Var}(X)}$ is in the same units as $X$ and is usually what's reported.

Variance is NOT linear: $\text{Var}(aX) = a^2 \text{Var}(X)$ , and $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)$ . If $X$ and $Y$ are independent, the covariance vanishes and variances add.

Sample mean

Given samples $X_1, \dots, X_n$ from a distribution with mean $\mu$ and variance $\sigma^2$ , the sample mean $\bar{X} = \frac{1}{n} \sum X_i$ has mean $\mu$ and variance $\sigma^2 / n$ . The standard error $\sigma / \sqrt{n}$ scales like $1 / \sqrt{n}$ — to halve the error, quadruple the sample size.

Moments

The $k$ -th moment of $X$ is $E[X^k]$ . Mean is the first moment; variance is the second central moment. Skewness ( $E[(X - \mu)^3] / \sigma^3$ ) measures asymmetry; kurtosis ( $E[(X - \mu)^4] / \sigma^4$ ) measures tail heaviness. Higher moments are useful for distinguishing distributions with the same mean and variance.

Expectation

For a discrete RV: $E[X] = \sum_x x \cdot P(X = x)$ . For continuous: $E[X] = \int x \cdot f(x) \, dx$ .

Key property — linearity of expectation: $E[aX + bY] = aE[X] + bE[Y]$ , even when $X$ and $Y$ are dependent. This is the single most useful identity in applied probability.

Variance

$\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$ . Standard deviation $\sigma = \sqrt{\text{Var}(X)}$ is in the same units as $X$ and is usually what's reported.