Hypothesis Testing — Section 3: Inferential Statistics

Hypothesis testing is a framework for asking "is the observed difference statistically meaningful, or could it have arisen by chance?" The mechanics are mechanical; the interpretation is where everyone trips up.

Null and alternative

The null hypothesis $H_0$ is the default — usually "no effect" or "no difference." The alternative $H_a$ is what you'd like to demonstrate. You assume $H_0$ is true and compute the probability of seeing data as extreme as yours.

p-value

The probability of observing data at least as extreme as what you saw, assuming $H_0$ is true. Small $p$ → the data would be unusual if $H_0$ held → reject $H_0$ . Conventional threshold: $p < 0.05$ .

What p is NOT

NOT "the probability that $H_0$ is true"
NOT "the probability that the effect is real"
NOT "the probability of a false positive across all your experiments"

It is purely a statement under the null. To talk about the probability $H_0$ is true, you need Bayesian inference and a prior.

Type I and Type II errors

Type I (false positive): rejecting $H_0$ when it's true. The probability is $\alpha$ — your significance threshold. Type II (false negative): failing to reject $H_0$ when $H_a$ is true. The probability is $\beta$ ; statistical power is $1 - \beta$ .

One-tailed vs two-tailed

A two-tailed test rejects if the effect is large in EITHER direction. One-tailed rejects only in one. Choose one-tailed only if you've pre-specified the direction and you wouldn't care about an effect in the other direction. One-tailed has more power; using one without pre-registering is p-hacking.

Multiple testing

Run 20 independent tests at $\alpha = 0.05$ and you'd expect ~1 false positive by chance. Correction methods: Bonferroni (divide $\alpha$ by number of tests, conservative), Benjamini-Hochberg (controls false discovery rate, less conservative). Always correct when testing many hypotheses.

Null and alternative

p-value

What p is NOT

NOT "the probability that $H_0$ is true"
NOT "the probability that the effect is real"
NOT "the probability of a false positive across all your experiments"

It is purely a statement under the null. To talk about the probability $H_0$ is true, you need Bayesian inference and a prior.