Attend the Women in ML Symposium on December 7

# tfp.substrates.numpy.distributions.Empirical

Stay organized with collections Save and categorize content based on your preferences.

Empirical distribution.

Inherits From: `Distribution`

The Empirical distribution is parameterized by a [batch] multiset of samples. It describes the empirical measure (observations) of a variable.

#### Mathematical Details

The probability mass function (pmf) and cumulative distribution function (cdf) are

``````pmf(k; s1, ..., sn) = sum_i I(k)^{k == si} / n
I(k)^{k == si} == 1, if k == si, else 0.
cdf(k; s1, ..., sn) = sum_i I(k)^{k >= si} / n
I(k)^{k >= si} == 1, if k >= si, else 0.
``````

#### Examples

``````
# Initialize a empirical distribution with 4 scalar samples.
dist = Empirical(samples=[0., 1., 1., 2.])
dist.cdf(1.)
==> 0.75
dist.prob([0., 1.])
==> [0.25, 0.5] # samples will be broadcast to
[[0., 1., 1., 2.], [0., 1., 1., 2.]] to match event.

# Initialize a empirical distribution with a [2] batch of scalar samples.
dist = Empirical(samples=[[0., 1.], [1., 2.]])
dist.cdf([0., 2.])
==> [0.5, 1.]
dist.prob(0.)
==> [0.5, 0] # event will be broadcast to [0., 0.] to match samples.

# Initialize a empirical distribution with 4 vector-like samples.
dist = Empirical(samples=[[0., 0.], [0., 1.], [0., 1.], [1., 2.]],
event_ndims=1)
dist.cdf([0., 1.])
==> 0.75
dist.prob([[0., 1.], [1., 2.]])
==> [0.5, 0.25] # samples will be broadcast to shape [2, 4, 2] to match event.

# Initialize a empirical distribution with a [2] batch of vector samples.
dist = Empirical(samples=[[[0., 0.], [0., 1.]], [[0., 1.], [1., 2.]]],
event_ndims=1)
dist.cdf([[0., 0.], [0., 1.]])
==> [0.5, 0.5]
dist.prob([0., 1.])
==> [0.5, 1.] # event will be broadcast to shape [[0., 1.], [0., 1.]]
to match samples.
``````

`samples` Numeric `Tensor` of shape `[B1, ..., Bk, S, E1, ..., En]`, `k, n >= 0`. Samples or batches of samples on which the distribution is based. The first `k` dimensions index into a batch of independent distributions. Length of `S` dimension determines number of samples in each multiset. The last `n` dimension represents samples for each distribution. n is specified by argument event_ndims.
`event_ndims` Python `int32`, default `0`. number of dimensions for each event. When `0` this distribution has scalar samples. When `1` this distribution has vector-like samples.
`validate_args` Python `bool`, default `False`. When `True` distribution parameters are checked for validity despite possibly degrading runtime performance. When `False` invalid inputs may silently render incorrect outputs.
`allow_nan_stats` Python `bool`, default `True`. When `True`, statistics (e.g., mean, mode, variance) use the value `NaN` to indicate the result is undefined. When `False`, an exception is raised if one or more of the statistic's batch members are undefined.
`name` Python `str` name prefixed to Ops created by this class.

`ValueError` if the rank of `samples` is statically known and less than event_ndims + 1.

`allow_nan_stats` Python `bool` describing behavior when a stat is undefined.

Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution's pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student's T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.

`batch_shape` Shape of a single sample from a single event index as a `TensorShape`.

May be partially defined or unknown.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

`dtype` The `DType` of `Tensor`s handled by this `Distribution`.
`event_shape` Shape of a single sample from a single batch as a `TensorShape`.

May be partially defined or unknown.

`experimental_shard_axis_names` The list or structure of lists of active shard axis names.
`name` Name prepended to all ops created by this `Distribution`.
`parameters` Dictionary of parameters used to instantiate this `Distribution`.
`reparameterization_type` Describes how samples from the distribution are reparameterized.

Currently this is one of the static instances `tfd.FULLY_REPARAMETERIZED` or `tfd.NOT_REPARAMETERIZED`.

`samples` Distribution parameter.
`trainable_variables`

`validate_args` Python `bool` indicating possibly expensive checks are enabled.
`variables`

## Methods

### `batch_shape_tensor`

View source

Shape of a single sample from a single event index as a 1-D `Tensor`.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

Args
`name` name to give to the op

Returns
`batch_shape` `Tensor`.

### `cdf`

View source

Cumulative distribution function.

Given random variable `X`, the cumulative distribution function `cdf` is:

``````cdf(x) := P[X <= x]
``````

Args
`value` `float` or `double` `Tensor`.
`name` Python `str` prepended to names of ops created by this function.
`**kwargs` Named arguments forwarded to subclass implementation.

Returns
`cdf` a `Tensor` of shape `sample_shape(x) + self.batch_shape` with values of type `self.dtype`.

### `compute_num_samples`

View source

Compute and return the number of values in `self.samples`.

Returns
`num_samples` int32 `Tensor` containing the number of entries in `self.samples`. If `self.samples` has shape `[..., S, E1, ..., Ee]` where the `E`'s are event dims, this method returns a `Tensor` whose values is `S`.

### `copy`

View source

Creates a deep copy of the distribution.

Args
`**override_parameters_kwargs` String/value dictionary of initialization arguments to override with new values.

Returns
`distribution` A new instance of `type(self)` initialized from the union of self.parameters and override_parameters_kwargs, i.e., `dict(self.parameters, **override_parameters_kwargs)`.

### `covariance`

View source

Covariance.

Covariance is (possibly) defined only for non-scalar-event distributions.

For example, for a length-`k`, vector-valued distribution, it is calculated as,

``````Cov[i, j] = Covariance(X_i, X_j) = E[(X_i - E[X_i]) (X_j - E[X_j])]
``````

where `Cov` is a (batch of) `k x k` matrix, `0 <= (i, j) < k`, and `E` denotes expectation.

Alternatively, for non-vector, multivariate distributions (e.g., matrix-valued, Wishart), `Covariance` shall return a (batch of) matrices under some vectorization of the events, i.e.,

``````Cov[i, j] = Covariance(Vec(X)_i, Vec(X)_j) = [as above]
``````

where `Cov` is a (batch of) `k' x k'` matrices, `0 <= (i, j) < k' = reduce_prod(event_shape)`, and `Vec` is some function mapping indices of this distribution's event dimensions to indices of a length-`k'` vector.

Args
`name` Python `str` prepended to names of ops created by this function.
`**kwargs` Named arguments forwarded to subclass implementation.

Returns
`covariance` Floating-point `Tensor` with shape `[B1, ..., Bn, k', k']` where the first `n` dimensions are batch coordinates and `k' = reduce_prod(self.event_shape)`.

### `cross_entropy`

View source

Computes the (Shannon) cross entropy.

Denote this distribution (`self`) by `P` and the `other` distribution by `Q`. Assuming `P, Q` are absolutely continuous with respect to one another and permit densities `p(x) dr(x)` and `q(x) dr(x)`, (Shannon) cross entropy is defined as:

``````H[P, Q] = E_p[-log q(X)] = -int_F p(x) log q(x) dr(x)
``````

where `F` denotes the support of the random variable `X ~ P`.

Args
`other` `tfp.distributions.Distribution` instance.
`name` Python `str` prepended to names of ops created by this function.

Returns
`cross_entropy` `self.dtype` `Tensor` with shape `[B1, ..., Bn]` representing