Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings


View source on GitHub

Build the Masked Autoregressive Density Estimator (Germain et al., 2015). (deprecated)


This will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ("mu" in [Germain et al. (2015)][1]) and log_scale ("alpha" in [Germain et al. (2015)][1]) from the MADE network.

About Hidden Layers

Each element of hidden_layers should be greater than the input_depth (i.e., input_depth = tf.shape(input)[-1] where input is the input to the neural network). This is necessary to ensure the autoregressivity property.

About Clipping

This function also optionally clips the log_scale (but possibly not its gradient). This is useful because if log_scale is too small/large it might underflow/overflow making it impossible for the MaskedAutoregressiveFlow bijector to implement a bijection. Additionally, the log_scale_clip_gradient bool indicates whether the gradient should also be clipped. The default does not clip the gradient; this is useful because it still provides gradient information (for fitting) yet solves the numerical stability problem. I.e., log_scale_clip_gradient = False means grad[exp(clip(x))] = grad[x] exp(clip(x)) rather than the usual grad[clip(x)] exp(clip(x)).


  • hidden_layers: Python list-like of non-negative integer, scalars indicating the number of units in each hidden layer. Default: `[512, 512].
  • shift_only: Python bool indicating if only the shift term shall be computed. Default: False.
  • activation: Activation function (callable). Explicitly setting to None implies a linear activation.
  • log_scale_min_clip: float-like scalar Tensor, or a Tensor with the same shape as log_scale. The minimum value to clip by. Default: -5.
  • log_scale_max_clip: float-like scalar Tensor, or a Tensor with the same shape as log_scale. The maximum value to clip by. Default: 3.
  • log_scale_clip_gradient: Python bool indicating that the gradient of tf.clip_by_value should be preserved. Default: False.
  • name: A name for ops managed by this function. Default: "masked_autoregressive_default_template".
  • *args: tf.compat.v1.layers.dense arguments.
  • **kwargs: tf.compat.v1.layers.dense keyword arguments.


  • shift: Float-like Tensor of shift terms (the "mu" in [Germain et al. (2015)][1]).
  • log_scale: Float-like Tensor of log(scale) terms (the "alpha" in [Germain et al. (2015)][1]).


  • NotImplementedError: if rightmost dimension of inputs is unknown prior to graph execution.


[1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. In International Conference on Machine Learning, 2015. https://arxiv.org/abs/1502.03509