View source on GitHub |
Functional interface for the group normalization layer.
tf.contrib.layers.group_norm(
inputs, groups=32, channels_axis=-1, reduction_axes=(-3, -2), center=True,
scale=True, epsilon=1e-06, activation_fn=None, param_initializers=None,
reuse=None, variables_collections=None, outputs_collections=None,
trainable=True, scope=None, mean_close_to_zero=False
)
Reference: https://arxiv.org/abs/1803.08494
"Group Normalization", Yuxin Wu, Kaiming He
Args | |
---|---|
inputs
|
A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined. |
groups
|
Integer. Divide the channels into this number of groups over which
normalization statistics are computed. This number must be commensurate
with the number of channels in inputs .
|
channels_axis
|
An integer. Specifies index of channels axis which will be
broken into groups , each of which whose statistics will be computed
across. Must be mutually exclusive with reduction_axes . Preferred usage
is to specify negative integers to be agnostic as to whether a batch
dimension is included.
|
reduction_axes
|
Tuple of integers. Specifies dimensions over which
statistics will be accumulated. Must be mutually exclusive with
channels_axis . Statistics will not be accumulated across axes not
specified in reduction_axes nor channel_axis . Preferred usage is to
specify negative integers to be agnostic to whether a batch dimension is
included.
Some sample usage cases: NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1] |
center
|
If True, add offset of beta to normalized tensor. If False, beta
is ignored.
|
scale
|
If True, multiply by gamma . If False, gamma is
not used. When the next layer is linear (also e.g. nn.relu ), this can be
disabled since the scaling can be done by the next layer.
|
epsilon
|
Small float added to variance to avoid dividing by zero. |
activation_fn
|
Activation function, default set to None to skip it and maintain a linear activation. |
param_initializers
|
Optional initializers for beta, gamma, moving mean and moving variance. |
reuse
|
Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given. |
variables_collections
|
Optional collections for the variables. |
outputs_collections
|
Collections to add the outputs. |
trainable
|
If True also add variables to the graph collection
GraphKeys.TRAINABLE_VARIABLES (see tf.Variable ).
|
scope
|
Optional scope for variable_scope .
|
mean_close_to_zero
|
The mean of input before ReLU will be close to zero
when batch size >= 4k for Resnet-50 on TPU. If True , use
nn.sufficient_statistics and nn.normalize_moments to calculate the
variance. This is the same behavior as fused equals True in batch
normalization. If False , use nn.moments to calculate the variance.
When mean is close to zero, like 1e-4, use mean to calculate the
variance may have poor result due to repeated roundoff error and
denormalization in mean . When mean is large, like 1e2,
sum(input ^2) is so large that only the high-order digits of the elements
are being accumulated. Thus, use sum(input - mean )^2/n to calculate
the variance has better accuracy compared to (sum(input ^2)/n - mean ^2)
when mean is large.
|
Returns | |
---|---|
A Tensor representing the output of the operation.
|
Raises | |
---|---|
ValueError
|
If the rank of inputs is undefined.
|
ValueError
|
If rank or channels dimension of inputs is undefined.
|
ValueError
|
If number of groups is not commensurate with number of channels. |
ValueError
|
If reduction_axes or channels_axis are out of bounds. |
ValueError
|
If reduction_axes are not mutually exclusive with channels_axis. |