tfp.substrates.jax.bijectors.masked_autoregressive_default_template

Build the Masked Autoregressive Density Estimator (Germain et al., 2015).

View aliases

Main aliases

tfp.experimental.substrates.jax.bijectors.masked_autoregressive_default_template

tfp.substrates.jax.bijectors.masked_autoregressive_default_template(
    hidden_layers,
    shift_only=False,
    activation=tf.nn.relu,
    log_scale_min_clip=-5.0,
    log_scale_max_clip=3.0,
    log_scale_clip_gradient=False,
    name=None,
    *args,
    **kwargs
)

This will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ('mu' in [Germain et al. (2015)][1]) and log_scale ('alpha' in [Germain et al. (2015)][1]) from the MADE network.

About Hidden Layers

Each element of hidden_layers should be greater than the input_depth (i.e., input_depth = tf.shape(input)[-1] where input is the input to the neural network). This is necessary to ensure the autoregressivity property.

About Clipping

This function also optionally clips the log_scale (but possibly not its gradient). This is useful because if log_scale is too small/large it might underflow/overflow making it impossible for the MaskedAutoregressiveFlow bijector to implement a bijection. Additionally, the log_scale_clip_gradient bool indicates whether the gradient should also be clipped. The default does not clip the gradient; this is useful because it still provides gradient information (for fitting) yet solves the numerical stability problem. I.e., log_scale_clip_gradient = False means grad[exp(clip(x))] = grad[x] exp(clip(x)) rather than the usual grad[clip(x)] exp(clip(x)).

Args
`hidden_layers`	Python `list`-like of non-negative integer, scalars indicating the number of units in each hidden layer. Default: `[512, 512]. </td> </tr><tr> <td>`shift_only`<a id="shift_only"></a> </td> <td> Python`bool`indicating if only the`shift`term shall be computed. Default:`False`. </td> </tr><tr> <td>`activation`<a id="activation"></a> </td> <td> Activation function (callable). Explicitly setting to`None`implies a linear activation. </td> </tr><tr> <td>`log_scale_min_clip`<a id="log_scale_min_clip"></a> </td> <td>`float`-like scalar`Tensor`, or a`Tensor`with the same shape as`log_scale`. The minimum value to clip by. Default: -5. </td> </tr><tr> <td>`log_scale_max_clip`<a id="log_scale_max_clip"></a> </td> <td>`float`-like scalar`Tensor`, or a`Tensor`with the same shape as`log_scale`. The maximum value to clip by. Default: 3. </td> </tr><tr> <td>`log_scale_clip_gradient`<a id="log_scale_clip_gradient"></a> </td> <td> Python`bool`indicating that the gradient of <a href="https://www.tensorflow.org/api_docs/python/tf/clip_by_value"><code>tf.clip_by_value</code></a> should be preserved. Default:`False`. </td> </tr><tr> <td>`name`<a id="name"></a> </td> <td> A name for ops managed by this function. Default: 'masked_autoregressive_default_template'. </td> </tr><tr> <td>`args`<a id="args"></a> </td> <td>`tf.layers.dense`arguments. </td> </tr><tr> <td>`kwargs`<a id="kwargs"></a> </td> <td>`tf.layers.dense` keyword arguments.

Args

hidden_layers Python list-like of non-negative integer, scalars indicating the number of units in each hidden layer. Default: [512, 512]. </td> </tr><tr> <td>shift_only<a id="shift_only"></a> </td> <td> Pythonboolindicating if only theshiftterm shall be computed. Default:False. </td> </tr><tr> <td>activation<a id="activation"></a> </td> <td> Activation function (callable). Explicitly setting toNoneimplies a linear activation. </td> </tr><tr> <td>log_scale_min_clip<a id="log_scale_min_clip"></a> </td> <td>float-like scalarTensor, or aTensorwith the same shape aslog_scale. The minimum value to clip by. Default: -5. </td> </tr><tr> <td>log_scale_max_clip<a id="log_scale_max_clip"></a> </td> <td>float-like scalarTensor, or aTensorwith the same shape aslog_scale. The maximum value to clip by. Default: 3. </td> </tr><tr> <td>log_scale_clip_gradient<a id="log_scale_clip_gradient"></a> </td> <td> Pythonboolindicating that the gradient of <a href="https://www.tensorflow.org/api_docs/python/tf/clip_by_value"><code>tf.clip_by_value</code></a> should be preserved. Default:False. </td> </tr><tr> <td>name<a id="name"></a> </td> <td> A name for ops managed by this function. Default: 'masked_autoregressive_default_template'. </td> </tr><tr> <td>args<a id="*args"></a> </td> <td>tf.layers.densearguments. </td> </tr><tr> <td>*kwargs<a id="**kwargs"></a> </td> <td>tf.layers.dense` keyword arguments.

Returns
`shift`	`Float`-like `Tensor` of shift terms (the 'mu' in [Germain et al. (2015)][1]).
`log_scale`	`Float`-like `Tensor` of log(scale) terms (the 'alpha' in [Germain et al. (2015)][1]).

Raises
`NotImplementedError`	if rightmost dimension of `inputs` is unknown prior to graph execution.

References

[1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. In International Conference on Machine Learning, 2015. https://arxiv.org/abs/1502.03509