Background

Monte Carlo integration refers to the practice of estimating an expectation with a sample mean. For example, given random variable Z in R^k with density p, the expectation of function f can be approximated like:

E_p[f(Z)] = \int f(z) p(z) dz
          ~ S_n
          := n^{-1} \sum_{i=1}^n f(z_i),  z_i iid samples from p.

If E_p[|f(Z)|] < infinity, then S_n --> E_p[f(Z)] by the strong law of large numbers. If E_p[f(Z)^2] < infinity, then S_n is asymptotically normal with variance Var[f(Z)] / n.

Practicioners of Bayesian statistics often find themselves wanting to estimate E_p[f(Z)] when the distribution p is known only up to a constant. For example, the joint distribution p(z, x) may be known, but the evidence p(x) = \int p(z, x) dz may be intractable. In that case, a parameterized distribution family q_lambda(z) may be chosen, and the optimal lambda is the one minimizing the KL divergence between q_lambda(z) and p(z | x). We only know p(z, x), but that is sufficient to find lambda.