Pooling

The pooling ops sweep a rectangular window over the input tensor, computing a reduction operation for each window (average, max, or max with argmax). Each pooling op uses rectangular windows of size ksize separated by offset strides. For example, if strides is all ones every window is used, if strides is all twos every other window is used in each dimension, etc.

In detail, the output is

output[i] = reduce(value[strides * i:strides * i + ksize])

where the indices also take into consideration the padding values. Please refer to the Convolution section for details about the padding calculation.

tf.nn.avg_pool(value, ksize, strides, padding, data_format='NHWC', name=None)

Performs the average pooling on the input.

Each entry in output is the mean of the corresponding size ksize window in value.

Args:
  • value: A 4-D Tensor of shape [batch, height, width, channels] and type float32, float64, qint8, quint8, or qint32.
  • ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.
  • strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
  • padding: A string, either 'VALID' or 'SAME'. The padding algorithm. See the comment here
  • data_format: A string. 'NHWC' and 'NCHW' are supported.
  • name: Optional name for the operation.
Returns:

A Tensor with the same type as value. The average pooled output tensor.


tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)

Performs the max pooling on the input.

Args:
  • value: A 4-D Tensor with shape [batch, height, width, channels] and type tf.float32.
  • ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.
  • strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
  • padding: A string, either 'VALID' or 'SAME'. The padding algorithm. See the comment here
  • data_format: A string. 'NHWC' and 'NCHW' are supported.
  • name: Optional name for the operation.
Returns:

A Tensor with type tf.float32. The max pooled output tensor.


tf.nn.max_pool_with_argmax(input, ksize, strides, padding, Targmax=None, name=None)

Performs max pooling on the input and outputs both max values and indices.

The indices in argmax are flattened, so that a maximum value at position [b, y, x, c] becomes flattened index ((b * height + y) * width + x) * channels + c.

Args:
  • input: A Tensor. Must be one of the following types: float32, half. 4-D with shape [batch, height, width, channels]. Input to pool over.
  • ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.
  • strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
  • padding: A string from: "SAME", "VALID". The type of padding algorithm to use.
  • Targmax: An optional tf.DType from: tf.int32, tf.int64. Defaults to tf.int64.
  • name: A name for the operation (optional).
Returns:

A tuple of Tensor objects (output, argmax).

  • output: A Tensor. Has the same type as input. The max pooled output tensor.
  • argmax: A Tensor of type Targmax. 4-D. The flattened indices of the max values chosen for each output.

tf.nn.avg_pool3d(input, ksize, strides, padding, name=None)

Performs 3D average pooling on the input.

Args:
  • input: A Tensor. Must be one of the following types: float32, float64, int64, int32, uint8, uint16, int16, int8, complex64, complex128, qint8, quint8, qint32, half. Shape [batch, depth, rows, cols, channels] tensor to pool over.
  • ksize: A list of ints that has length >= 5. 1-D tensor of length 5. The size of the window for each dimension of the input tensor. Must have ksize[0] = ksize[4] = 1.
  • strides: A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have strides[0] = strides[4] = 1.
  • padding: A string from: "SAME", "VALID". The type of padding algorithm to use.
  • name: A name for the operation (optional).
Returns:

A Tensor. Has the same type as input. The average pooled output tensor.


tf.nn.max_pool3d(input, ksize, strides, padding, name=None)

Performs 3D max pooling on the input.

Args:
  • input: A Tensor. Must be one of the following types: float32, float64, int64, int32, uint8, uint16, int16, int8, complex64, complex128, qint8, quint8, qint32, half. Shape [batch, depth, rows, cols, channels] tensor to pool over.
  • ksize: A list of ints that has length >= 5. 1-D tensor of length 5. The size of the window for each dimension of the input tensor. Must have ksize[0] = ksize[4] = 1.
  • strides: A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have strides[0] = strides[4] = 1.
  • padding: A string from: "SAME", "VALID". The type of padding algorithm to use.
  • name: A name for the operation (optional).
Returns:

A Tensor. Has the same type as input. The max pooled output tensor.


tf.nn.fractional_avg_pool(value, pooling_ratio, pseudo_random=None, overlapping=None, deterministic=None, seed=None, seed2=None, name=None)

Performs fractional average pooling on the input.

Fractional average pooling is similar to Fractional max pooling in the pooling region generation step. The only difference is that after pooling regions are generated, a mean operation is performed instead of a max operation in each pooling region.

Args:
  • value: A Tensor. Must be one of the following types: float32, float64, int32, int64. 4-D with shape [batch, height, width, channels].
  • pooling_ratio: A list of floats that has length >= 4. Pooling ratio for each dimension of value, currently only supports row and col dimension and should be >= 1.0. For example, a valid pooling ratio looks like [1.0, 1.44, 1.73, 1.0]. The first and last elements must be 1.0 because we don't allow pooling on batch and channels dimensions. 1.44 and 1.73 are pooling ratio on height and width dimensions respectively.
  • pseudo_random: An optional bool. Defaults to False. When set to True, generates the pooling sequence in a pseudorandom fashion, otherwise, in a random fashion. Check paper [Benjamin Graham, Fractional Max-Pooling] (http://arxiv.org/abs/1412.6071) for difference between pseudorandom and random.
  • overlapping: An optional bool. Defaults to False. When set to True, it means when pooling, the values at the boundary of adjacent pooling cells are used by both cells. For example:

    index 0 1 2 3 4

    value 20 5 16 3 7

    If the pooling sequence is [0, 2, 4], then 16, at index 2 will be used twice. The result would be [41/3, 26/3] for fractional avg pooling.

  • deterministic: An optional bool. Defaults to False. When set to True, a fixed pooling region will be used when iterating over a FractionalAvgPool node in the computation graph. Mainly used in unit test to make FractionalAvgPool deterministic.

  • seed: An optional int. Defaults to 0. If either seed or seed2 are set to be non-zero, the random number generator is seeded by the given seed. Otherwise, it is seeded by a random seed.
  • seed2: An optional int. Defaults to 0. An second seed to avoid seed collision.
  • name: A name for the operation (optional).
Returns:

A tuple of Tensor objects (output, row_pooling_sequence, col_pooling_sequence).

  • output: A Tensor. Has the same type as value. output tensor after fractional avg pooling.
  • row_pooling_sequence: A Tensor of type int64. row pooling sequence, needed to calculate gradient.
  • col_pooling_sequence: A Tensor of type int64. column pooling sequence, needed to calculate gradient.

tf.nn.fractional_max_pool(value, pooling_ratio, pseudo_random=None, overlapping=None, deterministic=None, seed=None, seed2=None, name=None)

Performs fractional max pooling on the input.

Fractional max pooling is slightly different than regular max pooling. In regular max pooling, you downsize an input set by taking the maximum value of smaller N x N subsections of the set (often 2x2), and try to reduce the set by a factor of N, where N is an integer. Fractional max pooling, as you might expect from the word "fractional", means that the overall reduction ratio N does not have to be an integer.

The sizes of the pooling regions are generated randomly but are fairly uniform. For example, let's look at the height dimension, and the constraints on the list of rows that will be pool boundaries.

First we define the following:

  1. input_row_length : the number of rows from the input set
  2. output_row_length : which will be smaller than the input
  3. alpha = input_row_length / output_row_length : our reduction ratio
  4. K = floor(alpha)
  5. row_pooling_sequence : this is the result list of pool boundary rows

Then, row_pooling_sequence should satisfy:

  1. a[0] = 0 : the first value of the sequence is 0
  2. a[end] = input_row_length : the last value of the sequence is the size
  3. K <= (a[i+1] - a[i]) <= K+1 : all intervals are K or K+1 size
  4. length(row_pooling_sequence) = output_row_length+1

For more details on fractional max pooling, see this paper: [Benjamin Graham, Fractional Max-Pooling] (http://arxiv.org/abs/1412.6071)

Args:
  • value: A Tensor. Must be one of the following types: float32, float64, int32, int64. 4-D with shape [batch, height, width, channels].
  • pooling_ratio: A list of floats that has length >= 4. Pooling ratio for each dimension of value, currently only supports row and col dimension and should be >= 1.0. For example, a valid pooling ratio looks like [1.0, 1.44, 1.73, 1.0]. The first and last elements must be 1.0 because we don't allow pooling on batch and channels dimensions. 1.44 and 1.73 are pooling ratio on height and width dimensions respectively.
  • pseudo_random: An optional bool. Defaults to False. When set to True, generates the pooling sequence in a pseudorandom fashion, otherwise, in a random fashion. Check paper [Benjamin Graham, Fractional Max-Pooling] (http://arxiv.org/abs/1412.6071) for difference between pseudorandom and random.
  • overlapping: An optional bool. Defaults to False. When set to True, it means when pooling, the values at the boundary of adjacent pooling cells are used by both cells. For example:

    index 0 1 2 3 4

    value 20 5 16 3 7

    If the pooling sequence is [0, 2, 4], then 16, at index 2 will be used twice. The result would be [20, 16] for fractional max pooling.

  • deterministic: An optional bool. Defaults to False. When set to True, a fixed pooling region will be used when iterating over a FractionalMaxPool node in the computation graph. Mainly used in unit test to make FractionalMaxPool deterministic.

  • seed: An optional int. Defaults to 0. If either seed or seed2 are set to be non-zero, the random number generator is seeded by the given seed. Otherwise, it is seeded by a random seed.
  • seed2: An optional int. Defaults to 0. An second seed to avoid seed collision.
  • name: A name for the operation (optional).
Returns:

A tuple of Tensor objects (output, row_pooling_sequence, col_pooling_sequence).

  • output: A Tensor. Has the same type as value. output tensor after fractional max pooling.
  • row_pooling_sequence: A Tensor of type int64. row pooling sequence, needed to calculate gradient.
  • col_pooling_sequence: A Tensor of type int64. column pooling sequence, needed to calculate gradient.