A/B testing randomly assigns users to a treatment group or a control group and compares outcomes. It's how product teams measure causal effects in production. Simple in principle, full of pitfalls in practice.
The setup
Random assignment, equal-sized groups (usually 50/50), pre-defined success metric, pre-defined sample size based on power analysis, run until target sample reached, analyze once.
Power analysis
Decide ahead of time how big an effect you want to detect, with what confidence (typically 95%) and power (typically 80%). Compute the required sample size. Underpowered tests miss real effects; overpowered tests waste user exposure.
For comparing two proportions, sample size per group is approximately:
where MDE is the minimum detectable effect and is the baseline rate.
Common pitfalls
- Peeking: checking the test repeatedly and stopping when significant. Inflates Type I error massively. Either pre-set the sample size, or use sequential testing methods designed for repeated looks.
- Multiple comparisons: testing many metrics, claiming significance on whichever wins. Correct with Bonferroni or accept that you're hypothesis-generating, not confirming.
- Sample ratio mismatch (SRM): groups aren't the size they should be. Suggests broken randomization or selection bias.
- Novelty effect: users react to anything new, results regress after the novelty fades. Run tests long enough to capture steady state.
- Network effects / interference: in social products, treatment users influence control users. Standard A/B assumptions break — use cluster randomization or network analysis.
What to measure
Pre-register the primary metric. Move secondary metrics in the right direction or you have a multi-metric problem. Always check guardrails (revenue, latency, error rates) didn't break.
When NOT to A/B test
- Effect too small to detect with feasible sample size
- Ethical issues with random assignment (medical, financial harm)
- Long-term effects (years) — switchback designs or quasi-experiments
- Network effects break the randomization assumption