View source on GitHub

Functional interface for the group normalization layer.


"Group Normalization", Yuxin Wu, Kaiming He

inputs A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined.
groups Integer. Divide the channels into this number of groups over which normalization statistics are computed. This number must be commensurate with the number of channels in inputs.
channels_axis An integer. Specifies index of channels axis which will be broken into groups, each of which whose statistics will be computed across. Must be mutually exclusive with reduction_axes. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.
reduction_axes Tuple of integers. Specifies dimensions over which statistics will be accumulated. Must be mutually exclusive with channels_axis. Statistics will not be accumulated across axes not specified in reduction_axes nor channel_axis. Preferred usage is to specify negative integers to be agnostic to whether a batch dimension is included.

Some sample usage cases: NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1]

center If True, add offset of beta to normalized tensor. If False, beta is ignored.
scale If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling can be done by the next layer.
epsilon Small float added to variance to avoid dividing by zero.
activation_fn Activation function, default set to None to skip it and maintain a linear activation.
param_initializers Optional initializers for beta, gamma, moving mean and moving variance.
reuse Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections Optional collections for the variables.
outputs_collections Collections to add the outputs.
trainable If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
scope Optional scope for variable_scope.
mean_close_to_zero The mean of input before ReLU will be close to zero when batch size >= 4k for Resnet-50 on TPU. If True, use nn.sufficient_statistics and nn.normalize_moments to calculate the variance. This is the same behavior as fused equals True in batch normalization. If False, use nn.moments to calculate the variance. When mean is close to zero, like 1e-4, use mean to calculate the variance may have poor result due to repeated roundoff error and denormalization in mean. When mean is large, like 1e2, sum(input^2) is so large that only the high-order digits of the elements are being accumulated. Thus, use sum(input - mean)^2/n to calculate the variance has better accuracy compared to (sum(input^2)/n - mean^2) when mean is large.

A Tensor representing the output of the operation.

ValueError If the rank of inputs is undefined.
ValueError If rank or channels dimension of inputs is undefined.
ValueError If number of groups is not commensurate with number of channels.
ValueError If reduction_axes or channels_axis are out of bounds.
ValueError If reduction_axes are not mutually exclusive with channels_axis.