tf.contrib.layers.layer_norm( inputs, center=True, scale=True, activation_fn=None, reuse=None, variables_collections=None, outputs_collections=None, trainable=True, begin_norm_axis=1, begin_params_axis=-1, scope=None )
Adds a Layer Normalization layer.
Based on the paper:
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Can be used as a normalizer function for conv2d and fully_connected.
Given a tensor
inputs of rank
R, moments are calculated and normalization
is performed over axes
begin_norm_axis ... R - 1. Scaling and centering,
if requested, is performed over axes
begin_params_axis .. R - 1.
begin_norm_axis = 1 and
begin_params_axis = -1,
meaning that normalization is performed over all but the first axis
NHWC), while the
parameters are calculated for the rightmost axis (the
NHWC). Scaling and recentering is performed via broadcast of the
gamma parameters with the normalized tensor.
The shapes of
and this part of the inputs' shape must be fully defined.
inputs: A tensor having rank
R. The normalization is performed over axes
begin_norm_axis ... R - 1and centering and scaling parameters are calculated over
begin_params_axis ... R - 1.
center: If True, add offset of
betato normalized tensor. If False,
scale: If True, multiply by
gamma. If False,
gammais not used. When the next layer is linear (also e.g.
nn.relu), this can be disabled since the scaling can be done by the next layer.
activation_fn: Activation function, default set to None to skip it and maintain a linear activation.
reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
Truealso add variables to the graph collection
begin_norm_axis: The first normalization dimension: normalization will be performed along dimensions
begin_norm_axis : rank(inputs)
begin_params_axis: The first parameter (beta, gamma) dimension: scale and centering parameters will have dimensions
begin_params_axis : rank(inputs)and will be broadcast with the normalized inputs accordingly.
scope: Optional scope for
Tensor representing the output of the operation, having the same
shape and dtype as
ValueError: If the rank of
inputsis not known at graph build time, or if
inputs.shape[begin_params_axis:]is not fully defined at graph build time.