tf.contrib.layers.group_norm

Functional interface for the group normalization layer.

tf.contrib.layers.group_norm(
    inputs, groups=32, channels_axis=-1, reduction_axes=(-3, -2), center=True,
    scale=True, epsilon=1e-06, activation_fn=None, param_initializers=None,
    reuse=None, variables_collections=None, outputs_collections=None,
    trainable=True, scope=None, mean_close_to_zero=False
)

Reference: https://arxiv.org/abs/1803.08494

"Group Normalization", Yuxin Wu, Kaiming He

Args
`inputs`	A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined.
`groups`	Integer. Divide the channels into this number of groups over which normalization statistics are computed. This number must be commensurate with the number of channels in `inputs`.
`channels_axis`	An integer. Specifies index of channels axis which will be broken into `groups`, each of which whose statistics will be computed across. Must be mutually exclusive with `reduction_axes`. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.
`reduction_axes`	Tuple of integers. Specifies dimensions over which statistics will be accumulated. Must be mutually exclusive with `channels_axis`. Statistics will not be accumulated across axes not specified in `reduction_axes` nor `channel_axis`. Preferred usage is to specify negative integers to be agnostic to whether a batch dimension is included. Some sample usage cases: NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1]
`center`	If True, add offset of `beta` to normalized tensor. If False, `beta` is ignored.
`scale`	If True, multiply by `gamma`. If False, `gamma` is not used. When the next layer is linear (also e.g. `nn.relu`), this can be disabled since the scaling can be done by the next layer.
`epsilon`	Small float added to variance to avoid dividing by zero.
`activation_fn`	Activation function, default set to None to skip it and maintain a linear activation.
`param_initializers`	Optional initializers for beta, gamma, moving mean and moving variance.
`reuse`	Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
`variables_collections`	Optional collections for the variables.
`outputs_collections`	Collections to add the outputs.
`trainable`	If `True` also add variables to the graph collection `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
`scope`	Optional scope for `variable_scope`.
`mean_close_to_zero`	The mean of `input` before ReLU will be close to zero when batch size >= 4k for Resnet-50 on TPU. If `True`, use `nn.sufficient_statistics` and `nn.normalize_moments` to calculate the variance. This is the same behavior as `fused` equals `True` in batch normalization. If `False`, use `nn.moments` to calculate the variance. When `mean` is close to zero, like 1e-4, use `mean` to calculate the variance may have poor result due to repeated roundoff error and denormalization in `mean`. When `mean` is large, like 1e2, sum(`input`^2) is so large that only the high-order digits of the elements are being accumulated. Thus, use sum(`input` - `mean`)^2/n to calculate the variance has better accuracy compared to (sum(`input`^2)/n - `mean`^2) when `mean` is large.

Returns
A `Tensor` representing the output of the operation.

Raises
`ValueError`	If the rank of `inputs` is undefined.
`ValueError`	If rank or channels dimension of `inputs` is undefined.
`ValueError`	If number of groups is not commensurate with number of channels.
`ValueError`	If reduction_axes or channels_axis are out of bounds.
`ValueError`	If reduction_axes are not mutually exclusive with channels_axis.

tf.contrib.layers.group_norm

Args

Returns

Raises