Normalization

Normalization is useful to prevent neurons from saturating when inputs may have varying scale, and to aid generalization.

tf.nn.l2_normalize(x, dim, epsilon=1e-12, name=None)

Normalizes along dimension dim using an L2 norm.

For a 1-D tensor with dim = 0, computes

output = x / sqrt(max(sum(x**2), epsilon))

For x with more dimensions, independently normalizes each 1-D slice along dimension dim.

Args:
  • x: A Tensor.
  • dim: Dimension along which to normalize. A scalar or a vector of integers.
  • epsilon: A lower bound value for the norm. Will use sqrt(epsilon) as the divisor if norm < sqrt(epsilon).
  • name: A name for this operation (optional).
Returns:

A Tensor with the same shape as x.


tf.nn.local_response_normalization(input, depth_radius=None, bias=None, alpha=None, beta=None, name=None)

Local Response Normalization.

The 4-D input tensor is treated as a 3-D array of 1-D vectors (along the last dimension), and each vector is normalized independently. Within a given vector, each component is divided by the weighted, squared sum of inputs within depth_radius. In detail,

sqr_sum[a, b, c, d] =
    sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
output = input / (bias + alpha * sqr_sum) ** beta

For details, see Krizhevsky et al., ImageNet classification with deep convolutional neural networks (NIPS 2012).

Args:
  • input: A Tensor. Must be one of the following types: float32, half. 4-D.
  • depth_radius: An optional int. Defaults to 5. 0-D. Half-width of the 1-D normalization window.
  • bias: An optional float. Defaults to 1. An offset (usually positive to avoid dividing by 0).
  • alpha: An optional float. Defaults to 1. A scale factor, usually positive.
  • beta: An optional float. Defaults to 0.5. An exponent.
  • name: A name for the operation (optional).
Returns:

A Tensor. Has the same type as input.


tf.nn.sufficient_statistics(x, axes, shift=None, keep_dims=False, name=None)

Calculate the sufficient statistics for the mean and variance of x.

These sufficient statistics are computed using the one pass algorithm on an input that's optionally shifted. See: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Computing_shifted_data

Args:
  • x: A Tensor.
  • axes: Array of ints. Axes along which to compute mean and variance.
  • shift: A Tensor containing the value by which to shift the data for numerical stability, or None if no shift is to be performed. A shift close to the true mean provides the most numerically stable results.
  • keep_dims: produce statistics with the same dimensionality as the input.
  • name: Name used to scope the operations that compute the sufficient stats.
Returns:

Four Tensor objects of the same type as x:

  • the count (number of elements to average over).
  • the (possibly shifted) sum of the elements in the array.
  • the (possibly shifted) sum of squares of the elements in the array.
  • the shift by which the mean must be corrected or None if shift is None.

tf.nn.normalize_moments(counts, mean_ss, variance_ss, shift, name=None)

Calculate the mean and variance of based on the sufficient statistics.

Args:
  • counts: A Tensor containing a the total count of the data (one value).
  • mean_ss: A Tensor containing the mean sufficient statistics: the (possibly shifted) sum of the elements to average over.
  • variance_ss: A Tensor containing the variance sufficient statistics: the (possibly shifted) squared sum of the data to compute the variance over.
  • shift: A Tensor containing the value by which the data is shifted for numerical stability, or None if no shift was performed.
  • name: Name used to scope the operations that compute the moments.
Returns:

Two Tensor objects: mean and variance.


tf.nn.moments(x, axes, shift=None, name=None, keep_dims=False)

Calculate the mean and variance of x.

The mean and variance are calculated by aggregating the contents of x across axes. If x is 1-D and axes = [0] this is just the mean and variance of a vector.

When using these moments for batch normalization (see tf.nn.batch_normalization):

  • for so-called "global normalization", used with convolutional filters with shape [batch, height, width, depth], pass axes=[0, 1, 2].
  • for simple batch normalization pass axes=[0] (batch only).
Args:
  • x: A Tensor.
  • axes: Array of ints. Axes along which to compute mean and variance.
  • shift: A Tensor containing the value by which to shift the data for numerical stability, or None in which case the true mean of the data is used as shift. A shift close to the true mean provides the most numerically stable results.
  • name: Name used to scope the operations that compute the moments.
  • keep_dims: produce moments with the same dimensionality as the input.
Returns:

Two Tensor objects: mean and variance.


tf.nn.weighted_moments(x, axes, frequency_weights, name=None, keep_dims=False)

Returns the frequency-weighted mean and variance of x.

Args:
  • x: A tensor.
  • axes: 1-d tensor of int32 values; these are the axes along which to compute mean and variance.
  • frequency_weights: A tensor of positive weights which can be broadcast with x.
  • name: Name used to scope the operation.
  • keep_dims: Produce moments with the same dimensionality as the input.
Returns:

Two tensors: weighted_mean and weighted_variance.


tf.nn.fused_batch_norm(x, scale, offset, mean=None, variance=None, epsilon=0.001, data_format='NHWC', is_training=True, name=None)

Batch normalization.

As described in http://arxiv.org/abs/1502.03167.

Args:
  • x: Input Tensor of 4 dimensions.
  • scale: A Tensor of 1 dimension for scaling.
  • offset: A Tensor of 1 dimension for bias.
  • mean: A Tensor of 1 dimension for population mean used for inference.
  • variance: A Tensor of 1 dimension for population variance used for inference.
  • epsilon: A small float number added to the variance of x.
  • data_format: The data format for x. Either "NHWC" (default) or "NCHW".
  • is_training: A bool value to specify if the operation is used for training or inference.
  • name: A name for this operation (optional).
Returns:
  • y: A 4D Tensor for the normalized, scaled, offsetted x.
  • batch_mean: A 1D Tensor for the mean of x.
  • batch_var: A 1D Tensor for the variance of x.
Raises:
  • ValueError: If mean or variance is not None when is_training is True.

tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilon, name=None)

Batch normalization.

As described in http://arxiv.org/abs/1502.03167. Normalizes a tensor by mean and variance, and applies (optionally) a scale \(\gamma\) to it, as well as an offset \(\beta\):

\(\frac{\gamma(x-\mu)}{\sigma}+\beta\)

mean, variance, offset and scale are all expected to be of one of two shapes:

  • In all generality, they can have the same number of dimensions as the input x, with identical sizes as x for the dimensions that are not normalized over (the 'depth' dimension(s)), and dimension 1 for the others which are being normalized over. mean and variance in this case would typically be the outputs of tf.nn.moments(..., keep_dims=True) during training, or running averages thereof during inference.
  • In the common case where the 'depth' dimension is the last dimension in the input tensor x, they may be one dimensional tensors of the same size as the 'depth' dimension. This is the case for example for the common [batch, depth] layout of fully-connected layers, and [batch, height, width, depth] for convolutions. mean and variance in this case would typically be the outputs of tf.nn.moments(..., keep_dims=False) during training, or running averages thereof during inference.
Args:
  • x: Input Tensor of arbitrary dimensionality.
  • mean: A mean Tensor.
  • variance: A variance Tensor.
  • offset: An offset Tensor, often denoted \(\beta\) in equations, or None. If present, will be added to the normalized tensor.
  • scale: A scale Tensor, often denoted \(\gamma\) in equations, or None. If present, the scale is applied to the normalized tensor.
  • variance_epsilon: A small float number to avoid dividing by 0.
  • name: A name for this operation (optional).
Returns:

the normalized, scaled, offset tensor.


tf.nn.batch_norm_with_global_normalization(t, m, v, beta, gamma, variance_epsilon, scale_after_normalization, name=None)

Batch normalization.

This op is deprecated. See tf.nn.batch_normalization.

Args:
  • t: A 4D input Tensor.
  • m: A 1D mean Tensor with size matching the last dimension of t. This is the first output from tf.nn.moments, or a saved moving average thereof.
  • v: A 1D variance Tensor with size matching the last dimension of t. This is the second output from tf.nn.moments, or a saved moving average thereof.
  • beta: A 1D beta Tensor with size matching the last dimension of t. An offset to be added to the normalized tensor.
  • gamma: A 1D gamma Tensor with size matching the last dimension of t. If "scale_after_normalization" is true, this tensor will be multiplied with the normalized tensor.
  • variance_epsilon: A small float number to avoid dividing by 0.
  • scale_after_normalization: A bool indicating whether the resulted tensor needs to be multiplied with gamma.
  • name: A name for this operation (optional).
Returns:

A batch-normalized t.