Linear regression fits a line (or hyperplane) through data by minimizing the sum of squared residuals. It's the workhorse of statistics — simple, interpretable, and a building block for nearly every fancier method.
The model
, with i.i.d. across observations.
OLS estimator
Minimize . Closed form: , where is the design matrix with a column of ones for the intercept.
What the coefficients mean
is the expected change in for a one-unit increase in , HOLDING ALL OTHER PREDICTORS CONSTANT. The "all else equal" qualifier matters — it makes the coefficient interpretable only when the other predictors are measured and modeled.
$R^2$
Fraction of variance in explained by the model. . Range for a model with intercept. Higher is better but easy to game by adding predictors — use adjusted or cross-validated instead.
Inference
Each coefficient has a standard error and a t-statistic. Coefficients with (roughly ) are "significant." Significance speaks to "is this effect distinguishable from zero" — not "is this effect important." A tiny coefficient on a precisely-measured variable can be significant but meaningless.
When OLS fails
OLS is BLUE (Best Linear Unbiased Estimator) under the assumptions: linearity, independence, homoscedasticity (equal variance of errors), no perfect multicollinearity, normal errors (for inference, not estimation). Real data violates some of these all the time — diagnostic plots (residuals vs fitted, Q-Q, leverage) tell you which.