View on TensorFlow.org | Run in Google Colab | View source on GitHub | Download notebook |

```
import collections
import tensorflow as tf
tf.compat.v2.enable_v2_behavior()
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors
```

## Basics

There are three important concepts associated with TensorFlow Distributions shapes:

*Event shape*describes the shape of a single draw from the distribution; it may be dependent across dimensions. For scalar distributions, the event shape is`[]`

. For a 5-dimensional MultivariateNormal, the event shape is`[5]`

.*Batch shape*describes independent, not identically distributed draws, aka a "batch" of distributions.*Sample shape*describes independent, identically distributed draws of batches from the distribution family.

The event shape and the batch shape are properties of a `Distribution`

object, whereas the sample shape is associated with a specific call to `sample`

or `log_prob`

.

This notebook's purpose is to illustrate these concepts through examples, so if this isn't immediately obvious, don't worry!

For another conceptual overview of these concepts, see this blog post.

### A note on TensorFlow Eager.

This entire notebook is written using TensorFlow Eager. None of the concepts presented *rely* on Eager, although with Eager, distribution batch and event shapes are evaluated (and therefore known) when the `Distribution`

object is created in Python, whereas in graph (non-Eager mode), it is possible to define distributions whose event and batch shapes are undetermined until the graph is run.

## Scalar Distributions

As we noted above, a `Distribution`

object has defined event and batch shapes. We'll start with a utility to describe distributions:

```
def describe_distributions(distributions):
print('\n'.join([str(d) for d in distributions]))
```

In this section we'll explore *scalar* distributions: distributions with an event shape of `[]`

. A typical example is the Poisson distribution, specified by a `rate`

:

```
poisson_distributions = [
tfd.Poisson(rate=1., name='One Poisson Scalar Batch'),
tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons'),
tfd.Poisson(rate=[[1., 10., 100.,], [2., 20., 200.]],
name='Two-by-Three Poissons'),
tfd.Poisson(rate=[1.], name='One Poisson Vector Batch'),
tfd.Poisson(rate=[[1.]], name='One Poisson Expanded Batch')
]
describe_distributions(poisson_distributions)
```

tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32) tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32) tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32) tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32)

The Poisson distribution is a scalar distribution, so its event shape is always `[]`

. If we specify more rates, these show up in the batch shape. The final pair of examples is interesting: there's only a single rate, but because that rate is embedded in a numpy array with non-empty shape, that shape becomes the batch shape.

The standard Normal distribution is also a scalar. It's event shape is `[]`

, just like for the Poisson, but we'll play with it to see our first example of *broadcasting*. The Normal is specified using `loc`

and `scale`

parameters:

```
normal_distributions = [
tfd.Normal(loc=0., scale=1., name='Standard'),
tfd.Normal(loc=[0.], scale=1., name='Standard Vector Batch'),
tfd.Normal(loc=[0., 1., 2., 3.], scale=1., name='Different Locs'),
tfd.Normal(loc=[0., 1., 2., 3.], scale=[[1.], [5.]],
name='Broadcasting Scale')
]
describe_distributions(normal_distributions)
```

tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32) tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32) tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32)

The interesting example above is the `Broadcasting Scale`

distribution. The `loc`

parameter has shape `[4]`

, and the `scale`

parameter has shape `[2, 1]`

. Using Numpy broadcasting rules, the batch shape is `[2, 4]`

. An equivalent (but less elegant and not-recommended) way to define the `"Broadcasting Scale"`

distribution would be:

```
describe_distributions(
[tfd.Normal(loc=[[0., 1., 2., 3], [0., 1., 2., 3.]],
scale=[[1., 1., 1., 1.], [5., 5., 5., 5.]])])
```

tfp.distributions.Normal("Normal", batch_shape=[2, 4], event_shape=[], dtype=float32)

We can see why the broadcasting notation is useful, although it's also a source of headaches and bugs.

### Sampling Scalar Distributions

There are two main things we can do with distributions: we can `sample`

from them and we can compute `log_prob`

s. Let's explore sampling first. The basic rule is that when we sample from a distribution, the resulting Tensor has shape `[sample_shape, batch_shape, event_shape]`

, where `batch_shape`

and `event_shape`

are provided by the `Distribution`

object, and `sample_shape`

is provided by the call to `sample`

. For scalar distributions, `event_shape = []`

, so the Tensor returned from sample will have shape `[sample_shape, batch_shape]`

. Let's try it:

```
def describe_sample_tensor_shape(sample_shape, distribution):
print('Sample shape:', sample_shape)
print('Returned sample tensor shape:',
distribution.sample(sample_shape).shape)
def describe_sample_tensor_shapes(distributions, sample_shapes):
started = False
for distribution in distributions:
print(distribution)
for sample_shape in sample_shapes:
describe_sample_tensor_shape(sample_shape, distribution)
print()
sample_shapes = [1, 2, [1, 5], [3, 4, 5]]
describe_sample_tensor_shapes(poisson_distributions, sample_shapes)
```

tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1,) Sample shape: 2 Returned sample tensor shape: (2,) Sample shape: [1, 5] Returned sample tensor shape: (1, 5) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5) tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 3) Sample shape: 2 Returned sample tensor shape: (2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 3) tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1) tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1, 1)

```
describe_sample_tensor_shapes(normal_distributions, sample_shapes)
```

tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1,) Sample shape: 2 Returned sample tensor shape: (2,) Sample shape: [1, 5] Returned sample tensor shape: (1, 5) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5) tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1) tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 4) Sample shape: 2 Returned sample tensor shape: (2, 4) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 4) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 4) tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 4) Sample shape: 2 Returned sample tensor shape: (2, 2, 4) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 4) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 4)

That's about all there is to say about `sample`

: returned sample tensors have shape `[sample_shape, batch_shape, event_shape]`

.

### Computing `log_prob`

For Scalar Distributions

Now let's take a look at `log_prob`

, which is somewhat trickier. `log_prob`

takes as input a (non-empty) tensor representing the location(s) at which to compute the `log_prob`

for the distribution. In the most straightforward case, this tensor will have a shape of the form `[sample_shape, batch_shape, event_shape]`

, where `batch_shape`

and `event_shape`

match the batch and event shapes of the distribution. Recall once more that for scalar distributions, `event_shape = []`

, so the input tensor has shape `[sample_shape, batch_shape]`

In this case, we get back a tensor of shape `[sample_shape, batch_shape]`

:

```
three_poissons = tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons')
three_poissons
```

<tfp.distributions.Poisson 'Three_Poissons' batch_shape=[3] event_shape=[] dtype=float32>

```
three_poissons.log_prob([[1., 10., 100.], [100., 10., 1]]) # sample_shape is [2].
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [-364.73938 , -2.0785608, -95.39484 ]], dtype=float32)>

```
three_poissons.log_prob([[[[1., 10., 100.], [100., 10., 1.]]]]) # sample_shape is [1, 1, 2].
```

<tf.Tensor: shape=(1, 1, 2, 3), dtype=float32, numpy= array([[[[ -1. , -2.0785608, -3.2223587], [-364.73938 , -2.0785608, -95.39484 ]]]], dtype=float32)>

Note how in the first example, the input and output have shape `[2, 3]`

and in the second example they have shape `[1, 1, 2, 3]`

.

That would be all there was to say, if it weren't for broadcasting. Here are the rules once we take broadcasting into account. We describe it in full generality and note simplifications for scalar distributions:

- Define
`n = len(batch_shape) + len(event_shape)`

. (For scalar distributions,`len(event_shape)=0`

.) - If the input tensor
`t`

has fewer than`n`

dimensions, pad its shape by adding dimensions of size`1`

on the left until it has exactly`n`

dimensions. Call the resulting tensor`t'`

. - Broadcast the
`n`

rightmost dimensions of`t'`

against the`[batch_shape, event_shape]`

of the distribution you're computing a`log_prob`

for. In more detail: for the dimensions where`t'`

already matches the distribution, do nothing, and for the dimensions where`t'`

has a singleton, replicate that singleton the appropriate number of times. Any other situation is an error. (For scalar distributions, we only broadcast against`batch_shape`

, since event_shape =`[]`

.) - Now we're finally able to compute the
`log_prob`

. The resulting tensor will have shape`[sample_shape, batch_shape]`

, where`sample_shape`

is defined to be any dimensions of`t`

or`t'`

to the left of the`n`

-rightmost dimensions:`sample_shape = shape(t)[:-n]`

.

This might be a mess if you don't know what it means, so let's work some examples:

```
three_poissons.log_prob([10.])
```

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-16.104412 , -2.0785608, -69.05272 ], dtype=float32)>

The tensor `[10.]`

(with shape `[1]`

) is broadcast across the `batch_shape`

of 3, so we evaluate all three Poissons' log probability at the value 10.

```
three_poissons.log_prob([[[1.], [10.]], [[100.], [1000.]]])
```

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[-1.0000000e+00, -7.6974149e+00, -9.5394836e+01], [-1.6104412e+01, -2.0785608e+00, -6.9052719e+01]], [[-3.6473938e+02, -1.4348087e+02, -3.2223587e+00], [-5.9131279e+03, -3.6195427e+03, -1.4069575e+03]]], dtype=float32)>

In the above example, the input tensor has shape `[2, 2, 1]`

, while the distributions object has a batch shape of 3. So for each of the `[2, 2]`

sample dimensions, the single value provided gets broadcats to each of the three Poissons.

A possibly useful way to think of it: because `three_poissons`

has `batch_shape = [2, 3]`

, a call to `log_prob`

must take a Tensor whose last dimension is either 1 or 3; anything else is an error. (The numpy broadcasting rules treat the special case of a scalar as being totally equivalent to a Tensor of shape `[1]`

.)

Let's test our chops by playing with the more complex Poisson distribution with `batch_shape = [2, 3]`

:

```
poisson_2_by_3 = tfd.Poisson(
rate=[[1., 10., 100.,], [2., 20., 200.]],
name='Two-by-Three Poissons')
```

```
poisson_2_by_3.log_prob(1.)
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>

```
poisson_2_by_3.log_prob([1.]) # Exactly equivalent to above, demonstrating the scalar special case.
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>

```
poisson_2_by_3.log_prob([[1., 1., 1.], [1., 1., 1.]]) # Another way to write the same thing. No broadcasting.
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>

```
poisson_2_by_3.log_prob([[1., 10., 100.]]) # Input is [1, 3] broadcast to [2, 3].
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [ -1.3068528, -5.14709 , -33.90767 ]], dtype=float32)>

```
poisson_2_by_3.log_prob([[1., 10., 100.], [1., 10., 100.]]) # Equivalent to above. No broadcasting.
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [ -1.3068528, -5.14709 , -33.90767 ]], dtype=float32)>

```
poisson_2_by_3.log_prob([[1., 1., 1.], [2., 2., 2.]]) # No broadcasting.
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -14.701683 , -190.09653 ]], dtype=float32)>

```
poisson_2_by_3.log_prob([[1.], [2.]]) # Equivalent to above. Input shape [2, 1] broadcast to [2, 3].
```

<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -14.701683 , -190.09653 ]], dtype=float32)>

The above examples involved broadcasting over the batch, but the sample shape was empty. Suppose we have a collection of values, and we want to get the log probability of each value at each point in the batch. We could do it manually:

```
poisson_2_by_3.log_prob([[[1., 1., 1.], [1., 1., 1.]], [[2., 2., 2.], [2., 2., 2.]]]) # Input shape [2, 2, 3].
```

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>

Or we could let broadcasting handle the last batch dimension:

```
poisson_2_by_3.log_prob([[[1.], [1.]], [[2.], [2.]]]) # Input shape [2, 2, 1].
```

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>

We can also (perhaps somewhat less naturally) let broadcasting handle just the first batch dimension:

```
poisson_2_by_3.log_prob([[[1., 1., 1.]], [[2., 2., 2.]]]) # Input shape [2, 1, 3].
```

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>

Or we could let broadcasting handle *both* batch dimensions:

```
poisson_2_by_3.log_prob([[[1.]], [[2.]]]) # Input shape [2, 1, 1].
```

The above worked fine when we had only two values we wanted, but suppose we had a long list of values we wanted to evaluate at every batch point. For that, the following notation, which adds extra dimensions of size 1 to the right side of the shape, is extremely useful:

```
poisson_2_by_3.log_prob(tf.constant([1., 2.])[..., tf.newaxis, tf.newaxis])
```

This is an instance of strided slice notation, which is worth knowing.

Going back to `three_poissons`

for completeness, the same example looks like:

```
three_poissons.log_prob([[1.], [10.], [50.], [100.]])
```

<tf.Tensor: shape=(4, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -16.104412 , -2.0785608, -69.05272 ], [-149.47777 , -43.34851 , -18.219261 ], [-364.73938 , -143.48087 , -3.2223587]], dtype=float32)>

```
three_poissons.log_prob(tf.constant([1., 10., 50., 100.])[..., tf.newaxis]) # Equivalent to above.
```

<tf.Tensor: shape=(4, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -16.104412 , -2.0785608, -69.05272 ], [-149.47777 , -43.34851 , -18.219261 ], [-364.73938 , -143.48087 , -3.2223587]], dtype=float32)>

## Multivariate distributions

We now turn to multivariate distributions, which have non-empty event shape. Let's look at multinomial distributions.

```
multinomial_distributions = [
# Multinomial is a vector-valued distribution: if we have k classes,
# an individual sample from the distribution has k values in it, so the
# event_shape is `[k]`.
tfd.Multinomial(total_count=100., probs=[.5, .4, .1],
name='One Multinomial'),
tfd.Multinomial(total_count=[100., 1000.], probs=[.5, .4, .1],
name='Two Multinomials Same Probs'),
tfd.Multinomial(total_count=100., probs=[[.5, .4, .1], [.1, .2, .7]],
name='Two Multinomials Same Counts'),
tfd.Multinomial(total_count=[100., 1000.],
probs=[[.5, .4, .1], [.1, .2, .7]],
name='Two Multinomials Different Everything')
]
describe_distributions(multinomial_distributions)
```

tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32)

Note how in the last three examples, the batch_shape is always `[2]`

, but we can use broadcasting to either have a shared `total_count`

or a shared `probs`

(or neither), because under the hood they are broadcast to have the same shape.

Sampling is straightforward, given what we know already:

```
describe_sample_tensor_shapes(multinomial_distributions, sample_shapes)
```

tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 3) Sample shape: 2 Returned sample tensor shape: (2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 3) tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3)

Computing log probabilities is equally straightforward. Let's work an example with diagonal Multivariate Normal distributions. (Multinomials are not very broadcast friendly, since the constraints on the counts and probabilities mean broadcasting will often produce inadmissible values.) We'll use a batch of 2 3-dimensional distributions with the same mean but different scales (standard deviations):

```
two_multivariate_normals = tfd.MultivariateNormalDiag(loc=[1., 2., 3.], scale_diag=tf.ones([2, 3]) * [[1.], [2.]])
two_multivariate_normals
```

<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2] event_shape=[3] dtype=float32>

Now let's evaluate the log probability of each batch point at its mean and at a shifted mean:

```
two_multivariate_normals.log_prob([[[1., 2., 3.]], [[3., 4., 5.]]]) # Input has shape [2,1,3].
```

<tf.Tensor: shape=(2, 2), dtype=float32, numpy= array([[-2.7568154, -4.836257 ], [-8.756816 , -6.336257 ]], dtype=float32)>

Exactly equivalently, we can use https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/strided-slice to insert an extra shape=1 dimension in the middle of a constant:

```
two_multivariate_normals.log_prob(
tf.constant([[1., 2., 3.], [3., 4., 5.]])[:, tf.newaxis, :]) # Equivalent to above.
```

<tf.Tensor: shape=(2, 2), dtype=float32, numpy= array([[-2.7568154, -4.836257 ], [-8.756816 , -6.336257 ]], dtype=float32)>

On the other hand, if we don't insert the extra dimension, we pass `[1., 2., 3.]`

to the first batch point and `[3., 4., 5.]`

to the second:

```
two_multivariate_normals.log_prob(tf.constant([[1., 2., 3.], [3., 4., 5.]]))
```

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2.7568154, -6.336257 ], dtype=float32)>

## Shape Manipulation Techniques

### The Reshape Bijector

The `Reshape`

bijector can be used to reshape the *event_shape* of a distribution. Let's see an example:

```
six_way_multinomial = tfd.Multinomial(total_count=1000., probs=[.3, .25, .2, .15, .08, .02])
six_way_multinomial
```

<tfp.distributions.Multinomial 'Multinomial' batch_shape=[] event_shape=[6] dtype=float32>

We created a multinomial with an event shape of `[6]`

. The Reshape Bijector allows us to treat this as a distribution with an event shape of `[2, 3]`

.

A `Bijector`

represents a differentiable, one-to-one function on an open subset of \({\mathbb R}^n\). `Bijectors`

are used in conjunction with `TransformedDistribution`

, which models a distribution \(p(y)\) in terms of a base distribution \(p(x)\) and a `Bijector`

that represents \(Y = g(X)\).
Let's see it in action:

```
transformed_multinomial = tfd.TransformedDistribution(
distribution=six_way_multinomial,
bijector=tfb.Reshape(event_shape_out=[2, 3]))
transformed_multinomial
```

<tfp.distributions.TransformedDistribution 'reshapeMultinomial' batch_shape=[] event_shape=[2, 3] dtype=float32>

```
six_way_multinomial.log_prob([500., 100., 100., 150., 100., 50.])
```

<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>

```
transformed_multinomial.log_prob([[500., 100., 100.], [150., 100., 50.]])
```

<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>

This is the *only* thing the `Reshape`

bijector can do: it cannot turn event dimensions into batch dimensions or vice-versa.

### The Independent Distribution

The `Independent`

distribution is used to treat a collection of independent, not-necessarily-identical (aka a batch of) distributions as a single distribution. More concisely, `Independent`

allows to convert dimensions in `batch_shape`

to dimensions in `event_shape`

. We'll illustrate by example:

```
two_by_five_bernoulli = tfd.Bernoulli(
probs=[[.05, .1, .15, .2, .25], [.3, .35, .4, .45, .5]],
name="Two By Five Bernoulli")
two_by_five_bernoulli
```

<tfp.distributions.Bernoulli 'Two_By_Five_Bernoulli' batch_shape=[2, 5] event_shape=[] dtype=int32>

We can think of this as two-by-five array of coins with the associated probabilities of heads. Let's evaluate the probability of a particular, arbitrary set of ones-and-zeros:

```
pattern = [[1., 0., 0., 1., 0.], [0., 0., 1., 1., 1.]]
two_by_five_bernoulli.log_prob(pattern)
```

<tf.Tensor: shape=(2, 5), dtype=float32, numpy= array([[-2.9957323 , -0.10536051, -0.16251892, -1.609438 , -0.2876821 ], [-0.35667497, -0.4307829 , -0.91629076, -0.79850775, -0.6931472 ]], dtype=float32)>

We can use `Independent`

to turn this into two different "sets of five Bernoulli's", which is useful if we want to consider a "row" of coin flips coming up in a given pattern as a single outcome:

```
two_sets_of_five = tfd.Independent(
distribution=two_by_five_bernoulli,
reinterpreted_batch_ndims=1,
name="Two Sets Of Five")
two_sets_of_five
```

<tfp.distributions.Independent 'Two_Sets_Of_Five' batch_shape=[2] event_shape=[5] dtype=int32>

Mathematically, we're computing the log probability of each "set" of five by summing the log probabilities of the five "independent" coin flips in the set, which is where the distribution gets its name:

```
two_sets_of_five.log_prob(pattern)
```

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-5.160732 , -3.1954036], dtype=float32)>

We can go even further and use `Independent`

to create a distribution where individual events are a set of two-by-five Bernoulli's:

```
one_set_of_two_by_five = tfd.Independent(
distribution=two_by_five_bernoulli, reinterpreted_batch_ndims=2,
name="One Set Of Two By Five")
one_set_of_two_by_five.log_prob(pattern)
```

<tf.Tensor: shape=(), dtype=float32, numpy=-8.356134>

It's worth noting that from the perspective of `sample`

, using `Independent`

changes nothing:

```
describe_sample_tensor_shapes(
[two_by_five_bernoulli,
two_sets_of_five,
one_set_of_two_by_five],
[[3, 5]])
```

tfp.distributions.Bernoulli("Two_By_Five_Bernoulli", batch_shape=[2, 5], event_shape=[], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5) tfp.distributions.Independent("Two_Sets_Of_Five", batch_shape=[2], event_shape=[5], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5) tfp.distributions.Independent("One_Set_Of_Two_By_Five", batch_shape=[], event_shape=[2, 5], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5)

As a parting exercise for the reader, we suggest considering the differences and similarities between a vector batch of `Normal`

distributions and a `MultivariateNormalDiag`

distribution from a sampling and log probability perspective. How can we use `Independent`

to construct a `MultivariateNormalDiag`

from a batch of `Normal`

s? (Note that `MultivariateNormalDiag`

is not actually implemented this way.)