Join TensorFlow at Google I/O, May 11-12

# tf.contrib.distributions.ExpRelaxedOneHotCategorical

ExpRelaxedOneHotCategorical distribution with temperature and logits.

Inherits From: `Distribution`

An ExpRelaxedOneHotCategorical distribution is a log-transformed RelaxedOneHotCategorical distribution. The RelaxedOneHotCategorical is a distribution over random probability vectors, vectors of positive real values that sum to one, which continuously approximates a OneHotCategorical. The degree of approximation is controlled by a temperature: as the temperature goes to 0 the RelaxedOneHotCategorical becomes discrete with a distribution described by the logits, as the temperature goes to infinity the RelaxedOneHotCategorical becomes the constant distribution that is identically the constant vector of (1/event_size, ..., 1/event_size).

Because computing log-probabilities of the RelaxedOneHotCategorical can suffer from underflow issues, this class is one solution for loss functions that depend on log-probabilities, such as the KL Divergence found in the variational autoencoder loss. The KL divergence between two distributions is invariant under invertible transformations, so evaluating KL divergences of ExpRelaxedOneHotCategorical samples, which are always followed by a `tf.exp` op, is equivalent to evaluating KL divergences of RelaxedOneHotCategorical samples. See the appendix of Maddison et al., 2016 for more mathematical details, where this distribution is called the ExpConcrete.

#### Examples

Creates a continuous distribution, whose exp approximates a 3-class one-hot categorical distribution. The 2nd class is the most likely to be the largest component in samples drawn from this distribution. If those samples are followed by a `tf.exp` op, then they are distributed as a relaxed onehot categorical.

``````temperature = 0.5
p = [0.1, 0.5, 0.4]
dist = ExpRelaxedOneHotCategorical(temperature, probs=p)
samples = dist.sample()
exp_samples = tf.exp(samples)
# exp_samples has the same distribution as samples from
# RelaxedOneHotCategorical(temperature, probs=p)
``````

Creates a continuous distribution, whose exp approximates a 3-class one-hot categorical distribution. The 2nd class is the most likely to be the largest component in samples drawn from this distribution.

``````temperature = 0.5
logits = [-2, 2, 0]
dist = ExpRelaxedOneHotCategorical(temperature, logits=logits)
samples = dist.sample()
exp_samples = tf.exp(samples)
# exp_samples has the same distribution as samples from
# RelaxedOneHotCategorical(temperature, probs=p)
``````

Creates a continuous distribution, whose exp approximates a 3-class one-hot categorical distribution. Because the temperature is very low, samples from this distribution are almost discrete, with one component almost 0 and the others very negative. The 2nd class is the most likely to be the largest component in samples drawn from this distribution.

``````temperature = 1e-5
logits = [-2, 2, 0]
dist = ExpRelaxedOneHotCategorical(temperature, logits=logits)
samples = dist.sample()
exp_samples = tf.exp(samples)
# exp_samples has the same distribution as samples from
# RelaxedOneHotCategorical(temperature, probs=p)
``````

Creates a continuous distribution, whose exp approximates a 3-class one-hot categorical distribution. Because the temperature is very high, samples from this distribution are usually close to the (-log(3), -log(3), -log(3)) vector. The 2nd class is still the most likely to be the largest component in samples drawn from this distribution.

``````temperature = 10
logits = [-2, 2, 0]
dist = ExpRelaxedOneHotCategorical(temperature, logits=logits)
samples = dist.sample()
exp_samples = tf.exp(samples)
# exp_samples has the same distribution as samples from
# RelaxedOneHotCategorical(temperature, probs=p)
``````

Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. 2016.

`temperature` An 0-D `Tensor`, representing the temperature of a set of ExpRelaxedCategorical distributions. The temperature should be positive.
`logits` An N-D `Tensor`, `N >= 1`, representing the log probabilities of a set of ExpRelaxedCategorical distributions. The first `N - 1` dimensions index into a batch of independent distributions and the last dimension represents a vector of logits for each class. Only one of `logits` or `probs` should be passed in.
`probs` An N-D `Tensor`, `N >= 1`, representing the probabilities of a set of ExpRelaxedCategorical distributions. The first `N - 1` dimensions index into a batch of independent distributions and the last dimension represents a vector of probabilities for each class. Only one of `logits` or `probs` should be passed in.
`dtype` The type of the event samples (default: inferred from logits/probs).
`validate_args` Python `bool`, default `False`. When `True` distribution parameters are checked for validity despite possibly degrading runtime performance. When `False` invalid inputs may silently render incorrect outputs.
`allow_nan_stats` Python `bool`, default `True`. When `True`, statistics (e.g., mean, mode, variance) use the value "`NaN`" to indicate the result is undefined. When `False`, an exception is raised if one or more of the statistic's batch members are undefined.
`name` Python `str` name prefixed to Ops created by this class.

`allow_nan_stats` Python `bool` describing behavior when a stat is undefined.

Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution's pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student's T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.

`batch_shape` Shape of a single sample from a single event index as a `TensorShape`.

May be partially defined or unknown.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

`dtype` The `DType` of `Tensor`s handled by this `Distribution`.
`event_shape` Shape of a single sample from a single batch as a `TensorShape`.

May be partially defined or unknown.

`event_size` Scalar `int32` tensor: the number of classes.
`logits` Vector of coordinatewise logits.
`name` Name prepended to all ops created by this `Distribution`.
`parameters` Dictionary of parameters used to instantiate this `Distribution`.
`probs` Vector of probabilities summing to one.
`reparameterization_type` Describes how samples from the distribution are reparameterized.

Currently this is one of the static instances `distributions.FULLY_REPARAMETERIZED` or `distributions.NOT_REPARAMETERIZED`.

`temperature` Batchwise temperature tensor of a RelaxedCategorical.
`validate_args` Python `bool` indicating possibly expensive checks are enabled.

## Methods

### `batch_shape_tensor`

View source

Shape of a single sample from a single event index as a 1-D `Tensor`.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

Args
`name` name to give to the op

Returns
`batch_shape` `Tensor`.

### `cdf`

View source

Cumulative distribution function.

Given random variable `X`, the cumulative distribution function `cdf` is:

``````cdf(x) := P[X <= x]
``````

Args
`value` `float` or `double` `Tensor`.
`name` Python `str` prepended to names of ops created by this function.

Returns
`cdf` a `Tensor` of shape `sample_shape(x) + self.batch_shape` with values of type `self.dtype`.

### `copy`

View source

Creates a deep copy of the distribution.

Args
`**override_parameters_kwargs` String/value dictionary of initialization arguments to override with new values.

Returns
`distribution` A new instance of `type(self)` initialized from the union of self.parameters and override_parameters_kwargs, i.e., `dict(self.parameters, **override_parameters_kwargs)`.

### `covariance`

View source

Covariance.

Covariance is (possibly) defined only for non-scalar-event distributions.

For example, for a length-`k`, vector-valued distribution, it is calculated as,

``````Cov[i, j] = Covariance(X_i, X_j) = E[(X_i - E[X_i]) (X_j - E[X_j])]
``````

where `Cov` is a (batch of) `k x k` matrix, `0 <= (i, j) < k`, and `E` denotes expectation.

Alternatively, for non-vector, multivariate distributions (e.g., matrix-valued, Wishart), `Covariance` shall return a (batch of) matrices under some vectorization of the events, i.e.,

``````Cov[i, j] = Covariance(Vec(X)_i, Vec(X)_j) = [as above]
``````

where `Cov` is a (batch of) `k' x k'` matrices, `0 <= (i, j) < k' = reduce_prod(event_shape)`, and `Vec` is some function mapping indices of this distribution's event dimensions to indices of a length-`k'` vector.

Args
`name` Python `str` prepended to names of ops created by this function.

Returns
`covariance` Floating-point `Tensor` with shape `[B1, ..., Bn, k', k']` where the first `n` dimensions are batch coordinates and `k' = reduce_prod(self.event_shape)`.

### `cross_entropy`

View source

Computes the (Shannon) cross entropy.

Denote this distribution (`self`) by `P` and the `other` distribution by `Q`. Assuming `P, Q` are absolutely continuous with respect to one another and permit densities `p(x) dr(x)` and `q(x) dr(x)`, (Shanon) cross entropy is defined as:

``````H[P, Q] = E_p[-log q(X)] = -int_F p(x) log q(x) dr(x)
``````

where `F` denotes the support of the random variable `X ~ P`.

Args
`other` `tfp.distributions.Distribution` instance.
`name` Python `str` prepended to names of ops created by this function.

Returns
`cross_entropy` `self.dtype` `Tensor` with shape `[B1, ..., Bn]` representing `n` different calculations of (Shanon) cross entropy.

### `entropy`

View source

Shannon entropy in nats.

### `event_shape_tensor`

View source

Shape of a single sample from a single batch as a 1-D int32 `Tensor`.

Args
`name` name to give to the op

Returns
`event_shape` `Tensor`.

### `is_scalar_batch`

View source

Indicates that `batch_shape == []`.

Args
`name` Python `str` prepended to names of ops created by this function.

Returns
`is_scalar_batch` `bool` scalar `Tensor`.

### `is_scalar_event`

View source

Indicates that `event_shape == []`.

Args
`name` Python `str` prepended to names of ops created by this function.

Returns
`is_scalar_event` `bool` scalar `Tensor`.

### `kl_divergence`

View source

Computes the Kullback--Leibler divergence.

Denote this distribution (`self`) by `p` and the `other` distribution by `q`. Assuming `p, q` are absolutely continuous with respect to reference measure `r`, the KL divergence is defined as:

``````KL[p, q] = E_p[log(p(X)/q(X))]
= -int_F p(x) log q(x) dr(x) + int_F p(x) log p(x) dr(x)
= H[p, q] - H[p]
``````

where `F` denotes the support of the random variable `X ~ p`, `H[., .]` denotes (Shanon) cross entropy, and `H[.]` denotes (Shanon) entropy.

Args
`other` `tfp.distributions.Distribution` instance.
`name` Python `str` prepended to names of ops created by this function.

Returns
`kl_divergence` `self.dtype` `Tensor` with shape `[B1, ..., Bn]` representing `n` different calculations of the Kullback-Leibler divergence.

### `log_cdf`

View source

Log cumulative distribution function.

Given random variable `X`, the cumulative distribution function `cdf` is:

``````log_cdf(x) := Log[ P[X <= x] ]
``````

Often, a numerical approximation can be used for `log_cdf(x)` that yields a more accurate a