Section 11 · Lesson 11.5

Logistic and Generalized Linear Models

Regression when the response isn't normally distributed.

When $y$ is binary, a count, or non-negative, OLS gives nonsense (negative probabilities, fractional counts). Generalized Linear Models (GLMs) fix this by linking the mean of $y$ to $X\beta$ through a link function.

Logistic regression handles binary $y \in \{0, 1\}$ :

P(y = 1 \mid x) = \sigma(X\beta) = \frac{1}{1 + e^{-X\beta}}

Fit by maximum likelihood. The coefficients have a clean interpretation as changes in log-odds.

Other widely used GLMs:

Poisson regression for count data uses $\log E[y] = X\beta$ .
Gamma regression for positive continuous data uses $1/E[y] = X\beta$ .

The general recipe is: pick an exponential-family distribution for $y$ and a link function relating its mean to $X\beta$ . Interpretation becomes nonlinear, but residual diagnostics, regularization, and inference all carry over from OLS.

In a logistic regression for default risk, the coefficient on debt-to-income is $0.5$ (with debt-to-income measured in tenths). Which is correct?