The expected value is the long-run average of a random variable. The variance measures how much it fluctuates around that average. Almost every statistic and learning algorithm reduces to manipulating these.
Expectation
For a discrete RV: . For continuous: .
Key property — linearity of expectation: , even when and are dependent. This is the single most useful identity in applied probability.
Variance
. Standard deviation is in the same units as and is usually what's reported.
Variance is NOT linear: , and . If and are independent, the covariance vanishes and variances add.
Sample mean
Given samples from a distribution with mean and variance , the sample mean has mean and variance . The standard error scales like — to halve the error, quadruple the sample size.
Moments
The -th moment of is . Mean is the first moment; variance is the second central moment. Skewness () measures asymmetry; kurtosis () measures tail heaviness. Higher moments are useful for distinguishing distributions with the same mean and variance.