Defined in tensorflow/contrib/distributions/python/ops/bijectors/

Build the Masked Autoregressive Density Estimator (Germain et al., 2015). (deprecated)

THIS FUNCTION IS DEPRECATED. It will be removed after 2018-10-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability ( You should update all references to use tfp.distributions instead of tf.contrib.distributions.

This will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ("mu" in [Germain et al. (2015)][1]) and log_scale ("alpha" in [Germain et al. (2015)][1]) from the MADE network.

About Hidden Layers

Each element of hidden_layers should be greater than the input_depth (i.e., input_depth = tf.shape(input)[-1] where input is the input to the neural network). This is necessary to ensure the autoregressivity property.

About Clipping

This function also optionally clips the log_scale (but possibly not its gradient). This is useful because if log_scale is too small/large it might underflow/overflow making it impossible for the MaskedAutoregressiveFlow bijector to implement a bijection. Additionally, the log_scale_clip_gradient bool indicates whether the gradient should also be clipped. The default does not clip the gradient; this is useful because it still provides gradient information (for fitting) yet solves the numerical stability problem. I.e., log_scale_clip_gradient = False means grad[exp(clip(x))] = grad[x] exp(clip(x)) rather than the usual grad[clip(x)] exp(clip(x)).


  • hidden_layers: Python list-like of non-negative integer, scalars indicating the number of units in each hidden layer. Default: `[512, 512].
  • shift_only: Python bool indicating if only the shift term shall be computed. Default: False.
  • activation: Activation function (callable). Explicitly setting to None implies a linear activation.
  • log_scale_min_clip: float-like scalar Tensor, or a Tensor with the same shape as log_scale. The minimum value to clip by. Default: -5.
  • log_scale_max_clip: float-like scalar Tensor, or a Tensor with the same shape as log_scale. The maximum value to clip by. Default: 3.
  • log_scale_clip_gradient: Python bool indicating that the gradient of tf.clip_by_value should be preserved. Default: False.
  • name: A name for ops managed by this function. Default: "masked_autoregressive_default_template".
  • *args: tf.layers.dense arguments.
  • **kwargs: tf.layers.dense keyword arguments.


  • shift: Float-like Tensor of shift terms (the "mu" in [Germain et al. (2015)][1]).
  • log_scale: Float-like Tensor of log(scale) terms (the "alpha" in [Germain et al. (2015)][1]).


  • NotImplementedError: if rightmost dimension of inputs is unknown prior to graph execution.


[1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked Autoencoder for Distribution Estimation. In International Conference on Machine Learning, 2015.