|View source on GitHub|
Functional interface for the group normalization layer.
tf.contrib.layers.group_norm( inputs, groups=32, channels_axis=-1, reduction_axes=(-3, -2), center=True, scale=True, epsilon=1e-06, activation_fn=None, param_initializers=None, reuse=None, variables_collections=None, outputs_collections=None, trainable=True, scope=None, mean_close_to_zero=False )
"Group Normalization", Yuxin Wu, Kaiming He
inputs: A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined.
groups: Integer. Divide the channels into this number of groups over which normalization statistics are computed. This number must be commensurate with the number of channels in
channels_axis: An integer. Specifies index of channels axis which will be broken into
groups, each of which whose statistics will be computed across. Must be mutually exclusive with
reduction_axes. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.
reduction_axes: Tuple of integers. Specifies dimensions over which statistics will be accumulated. Must be mutually exclusive with
channels_axis. Statistics will not be accumulated across axes not specified in
channel_axis. Preferred usage is to specify negative integers to be agnostic to whether a batch dimension is included.
Some sample usage cases: NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1]
center: If True, add offset of
betato normalized tensor. If False,
scale: If True, multiply by
gamma. If False,
gammais not used. When the next layer is linear (also e.g.
nn.relu), this can be disabled since the scaling can be done by the next layer.
epsilon: Small float added to variance to avoid dividing by zero.
activation_fn: Activation function, default set to None to skip it and maintain a linear activation.
param_initializers: Optional initializers for beta, gamma, moving mean and moving variance.
reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
Truealso add variables to the graph collection
scope: Optional scope for
mean_close_to_zero: The mean of
inputbefore ReLU will be close to zero when batch size >= 4k for Resnet-50 on TPU. If
nn.normalize_momentsto calculate the variance. This is the same behavior as
Truein batch normalization. If
nn.momentsto calculate the variance. When
meanis close to zero, like 1e-4, use
meanto calculate the variance may have poor result due to repeated roundoff error and denormalization in
meanis large, like 1e2, sum(
input^2) is so large that only the high-order digits of the elements are being accumulated. Thus, use sum(
mean)^2/n to calculate the variance has better accuracy compared to (sum(
Tensor representing the output of the operation.
ValueError: If the rank of
ValueError: If rank or channels dimension of
ValueError: If number of groups is not commensurate with number of channels.
ValueError: If reduction_axes or channels_axis are out of bounds.
ValueError: If reduction_axes are not mutually exclusive with channels_axis.