MCMC and Modern Computation — Section 10: Bayesian Inference

For non-conjugate models, posteriors don't have closed forms — and in high dimensions, the normalizing integral $P(x) = \int P(x \mid \theta) P(\theta)\, d\theta$ is intractable to compute directly.

Markov Chain Monte Carlo (MCMC) sidesteps the issue by constructing a Markov chain whose stationary distribution is the target posterior. Run the chain long enough and you get samples from the posterior, which you use as a stand-in for the analytical distribution.

Three workhorse algorithms:

Metropolis–Hastings proposes random moves and accepts them with a probability that preserves detailed balance.
Gibbs sampling cycles through parameters, sampling each from its full conditional given the others.
Hamiltonian Monte Carlo uses gradient information to make efficient, long, low-rejection moves through high-dimensional posteriors.

Modern probabilistic programming languages (Stan, PyMC, NumPyro) hide most of the algorithmic mechanics. The harder skills are now diagnosing convergence (R-hat, effective sample size, divergent transitions) and specifying models that capture the structure of the problem.

Three workhorse algorithms:

Metropolis–Hastings proposes random moves and accepts them with a probability that preserves detailed balance.
Gibbs sampling cycles through parameters, sampling each from its full conditional given the others.
Hamiltonian Monte Carlo uses gradient information to make efficient, long, low-rejection moves through high-dimensional posteriors.