Wiki Wiki Web

A Brief Introduction to Markov Chain Monte Carlo (MCMC)

Context

  • Scientific Domain: MCMC is a cornerstone of computational statistics and Bayesian inference, with profound applications across machine learning, statistical physics, computational biology, and econometrics. It sits at the intersection of probability theory, statistics, and computer science.
  • Inception of the Field: The foundational MCMC algorithm, the Metropolis algorithm, was introduced in 1953. Its primary purpose was to solve difficult integration problems in statistical physics.
  • Key Researchers: The algorithm was developed by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller. The field was revolutionized later by the Gibbs sampler (Stuart and Donald Geman) and the generalized Metropolis-Hastings algorithm (W.K. Hastings).

Main Results

  • Mathematical Setup: The core problem is estimating expectations with respect to a target probability distribution $\pi(x)$, which we can evaluate only up to a normalizing constant. We want to compute: $$ \mathbb{E}_{\pi}[f] = \int f(x) \pi(x) dx $$ For example, in Bayesian statistics, the posterior is $\pi(\theta \mid \text{Data}) \propto P(\text{Data} \mid \theta) P(\theta)$. MCMC constructs an ergodic Markov chain ${X_0, X_1, \dots, X_n}$ whose stationary distribution is $\pi$. By the Ergodic Theorem, the sample average converges to the expectation: $$ \frac{1}{n} \sum_{t=1}^{n} f(X_t) \to \mathbb{E}_{\pi}[f] \quad \text{as } n \to \infty $$
  • Main Results & The Metropolis-Hastings Algorithm: The key result is that a Markov chain with a desired stationary distribution $\pi$ can be created using a simpler proposal distribution $q(x' \mid x)$. The main algorithmic achievement is the Metropolis-Hastings algorithm:
    1. Given current state $X_t = x$, propose a new state $x'$ from $q(x' \mid x)$.
    2. Calculate the acceptance probability: $$ \alpha(x' \mid x) = \min \left( 1, \frac{\pi(x') \, q(x \mid x')}{\pi(x) \, q(x' \mid x)} \right) $$
    3. Accept the move $X_{t+1} = x'$ with probability $\alpha$; otherwise, set $X_{t+1} = x$.

The normalizing constant of $\pi$ cancels in the ratio, making it computable. The Gibbs sampler is a special case where proposals are from full conditional distributions, leading to an acceptance probability of 1.

References

  • Foundational:
    • Metropolis, N., et al. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics.
    • Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika.
  • Seminal (Gibbs Sampler):
    • Geman, S., & Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  • Key Textbook:
    • Robert, C. P., & Casella, G. (2004). Monte Carlo Statistical Methods. Springer.