|View source on GitHub|
Models the distribution of finitely many importance-reweighted samples.
tfp.experimental.distributions.ImportanceResample( proposal_distribution, target_log_prob_fn, importance_sample_size, sample_size=1, stochastic_approximation_seed=None, validate_args=False, name=None )
This wrapper adapts a proposal distribution towards a target density using
Given a proposal
q, a target density
p (which may be unnormalized), and
importance_sample_size, it models the result of the following
z[k] ~ qfrom the proposal.
- Compute an importance weight
w[k] = p(z[k]) / q(z[k])for each sample.
- Return a sample
z[k*]selected with probability proportional to the importance weights, i.e., with
k* ~ Categorical(probs=w/sum(w)).
In the limit where
importance_sample_size -> inf, the result
z[k*] of this
procedure would be distributed according to the target density
p. On the
other hand, if
importance_sample_size == 1, then the reweighting has no
effect and the result
z[k*] is simply a sample from
q. Finite values
importance_sample_size describe distributions that are intermediate
This distribution may also be understood as an explicit representation of the surrogate posterior that is implicitly assumed by importance-weighted variational objectives. [1, 2]
This distribution can be used directly for posterior inference via importance sampling:
tfd = tfp.distributions tfed = tfp.experimental.distributions def target_log_prob_fn(x): prior = tfd.Normal(loc=0., scale=1.).log_prob(x) # Multimodal likelihood. likelihood = tf.reduce_logsumexp( tfd.Normal(loc=x, scale=0.1).log_prob([-1., 1.])) return prior + likelihood # Use importance sampling to infer an approximate posterior. approximate_posterior = tfed.ImportanceResample( proposal_distribution=tfd.Normal(loc=0., scale=2.), target_log_prob_fn=target_log_prob_fn, importance_sample_size=100)
We can estimate posterior expectations directly using an importance-weighted sum of proposal samples:
# Directly compute expectations under the posterior via importance weights. posterior_mean = approximate_posterior.self_normalized_expectation( lambda x: x, importance_sample_size=1000) posterior_variance = approximate_posterior.self_normalized_expectation( lambda x: (x - posterior_mean)**2, importance_sample_size=1000)
Alternately, the same expectations can be estimated from explicit (unweighted)
samples. Note that sampling may be expensive because it performs resampling
internally. For example, to produce
sample_size samples requires first
proposing values of shape
[1000, 100] in the code below) and then resampling down to
throwing most of the proposals away. For this reason you should prefer calling
self_normalized_expectation over naive sampling to compute expectations.
posterior_samples = approximate_posterior.sample(1000) posterior_mean_inefficient = tf.reduce_mean(posterior_samples) posterior_variance_inefficient = tf.math.reduce_variance(posterior_samples) # Calling `self_normalized_expectation` allows for a much lower `sample_size` # because it uses the full set of `importance_sample_size` proposal samples to # approximate the expectation at each of the `sample_size` Monte Carlo # evaluations. This is formalized in Eq. 9 of . posterior_mean_efficient = approximate_posterior.self_normalized_expectation( lambda x: x, sample_size=10) posterior_variance_efficient = ( approximate_posterior.self_normalized_expectation( lambda x: (x - posterior_mean_efficient)**2, sample_size=10))
The posterior (log-)density cannot be computed directly, but may be
stochastically approximated. The
log_prob methods accept
sample_size to control the variance of the
# Plot the posterior density. from matplotlib import pylab as plt xs = tf.linspace(-3., 3., 101) probs = approximate_posterior.prob(xs, sample_size=10, seed=(42, 42)) plt.plot(xs, probs)
Connections to importance-weighted variational inference
Optimizing an importance-weighted variational bound provides a natural
approach to choose a proposal distribution for importance sampling.
Importance-weighted bounds are available directly in TFP via the
importance_sample_size argument to
tfp.vi.fit_surrogate_posterior. For example, we might improve on the
example above by replacing the fixed proposal distribution with a learned
proposal_distribution = tfp.experimental.util.make_trainable(tfd.Normal) importance_sample_size = 100 importance_weighted_losses = tfp.vi.fit_surrogate_posterior( target_log_prob_fn, surrogate_posterior=proposal_distribution, optimizer=tf.optimizers.Adam(0.1), num_steps=200, importance_sample_size=importance_sample_size) approximate_posterior = tfed.ImportanceResample( proposal_distribution=proposal_distribution, target_log_prob_fn=target_log_prob_fn, importance_sample_size=importance_sample_size)
Note that although the importance-resampled
ultimately as the surrogate posterior, only the bare proposal distribution
is passed as the
surrogate_posterior argument to
This is because the
importance_sample_size argument tells
fit_surrogate_posterior to compute an importance-weighted bound directly
from the proposal distribution. Mathematically, it would be equivalent to omit
importance_sample_size argument and instead pass an
distribution as the surrogate posterior:
equivalent_but_less_efficient_losses = tfp.vi.fit_surrogate_posterior( target_log_prob_fn, surrogate_posterior=tfed.ImportanceResample( proposal_distribution=proposal_distribution, target_log_prob_fn=target_log_prob_fn, importance_sample_size=importance_sample_size), optimizer=tf.optimizers.Adam(0.1), num_steps=200)
but this approach is not recommended, because it performs redundant
evaluations of the
target_log_prob_fn compared to the direct bound shown
 Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance Weighted Autoencoders. In International Conference on Learning Representations, 2016. https://arxiv.org/abs/1509.00519  Chris Cremer, Quaid Morris, and David Duvenaud. Reinterpreting Importance-Weighted Autoencoders. In International Conference on Learning Representations, Workshop track, 2017. https://arxiv.org/abs/1704.02916  Justin Domke, Daniel Sheldon. Importance Weighting and Variational Inference. In Neural Information Processing Systems (NIPS), 2018. https://arxiv.org/abs/1808.09034
Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution's pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student's T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.
Shape of a single sample from a single event index as a
May be partially defined or unknown.
The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.
Shape of a single sample from a single batch as a
May be partially defined or unknown.
||The list or structure of lists of active shard axis names.|
Name prepended to all ops created by this
||Sequence of non-trainable variables owned by this module and its submodules.|
Dictionary of parameters used to instantiate this
Describes how samples from the distribution are reparameterized.
Currently this is one of the static instances
Sequence of all sub-modules.
Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).
||Sequence of trainable variables owned by this module and its submodules.|
||Sequence of variables owned by this module and its submodules.|