For non-conjugate models, posteriors don't have closed forms — and in high dimensions, the normalizing integral is intractable to compute directly.
Markov Chain Monte Carlo (MCMC) sidesteps the issue by constructing a Markov chain whose stationary distribution is the target posterior. Run the chain long enough and you get samples from the posterior, which you use as a stand-in for the analytical distribution.
Three workhorse algorithms:
- Metropolis–Hastings proposes random moves and accepts them with a probability that preserves detailed balance.
- Gibbs sampling cycles through parameters, sampling each from its full conditional given the others.
- Hamiltonian Monte Carlo uses gradient information to make efficient, long, low-rejection moves through high-dimensional posteriors.
Modern probabilistic programming languages (Stan, PyMC, NumPyro) hide most of the algorithmic mechanics. The harder skills are now diagnosing convergence (R-hat, effective sample size, divergent transitions) and specifying models that capture the structure of the problem.