Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.
"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Can be used as a normalizer function for conv2d and fully_connected.
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) if update_ops: updates = tf.group(*update_ops) total_loss = control_flow_ops.with_dependencies([updates], total_loss)
One can set updates_collections=None to force the updates in place, but that can have speed penalty, specially in distributed settings.
inputs: a tensor with 2 or more dimensions, where the first dimension has
batch_size. The normalization is over all but the last dimension if
NHWCand the second dimension if
decay: decay for the moving average. Reasonable values for
decayare close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower
decayvalue (recommend trying
decay=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.
center: If True, add offset of
betato normalized tensor. If False,
scale: If True, multiply by
gamma. If False,
gammais not used. When the next layer is linear (also e.g.
nn.relu), this can be disabled since the scaling can be done by the next layer.
epsilon: small float added to variance to avoid dividing by zero.
activation_fn: activation function, default set to None to skip it and maintain a linear activation.
param_initializers: optional initializers for beta, gamma, moving mean and moving variance.
updates_collections: collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.
is_training: whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into
moving_varianceusing an exponential moving average with the given
decay. When it is not in training mode then it would use the values of the
reuse: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections: optional collections for the variables.
outputs_collections: collections to add the outputs.
Truealso add variables to the graph collection
batch_weights: An optional tensor of shape
[batch_size], containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)
fused: Use nn.fused_batch_norm if True, nn.batch_normalization otherwise.
data_format: A string.
zero_debias_moving_mean: Use zero_debias for moving_mean. It creates a new pair of variables 'moving_mean/biased' and 'moving_mean/local_step'.
scope: Optional scope for
Tensor representing the output of the operation.
batch_weightsis not None and
ValueError: if the rank of
ValueError: if rank or channels dimension of