The convolution ops sweep a 2-D filter over a batch of images, applying the filter to each window of each image of the appropriate size. The different ops trade off between generic vs. specific filters:

`conv2d`

: Arbitrary filters that can mix channels together.`depthwise_conv2d`

: Filters that operate on each channel independently.`separable_conv2d`

: A depthwise spatial filter followed by a pointwise filter.

Note that although these ops are called "convolution", they are strictly speaking "cross-correlation" since the filter is combined with an input window without reversing the filter. For details, see the properties of cross-correlation.

The filter is applied to image patches of the same size as the filter and
strided according to the `strides`

argument. `strides = [1, 1, 1, 1]`

applies
the filter to a patch at every offset, `strides = [1, 2, 2, 1]`

applies the
filter to every other image patch in each dimension, etc.

Ignoring channels for the moment, and assume that the 4-D `input`

has shape
`[batch, in_height, in_width, ...]`

and the 4-D `filter`

has shape
`[filter_height, filter_width, ...]`

, then the spatial semantics of the
convolution ops are as follows: first, according to the padding scheme chosen
as `'SAME'`

or `'VALID'`

, the output size and the padding pixels are computed.
For the `'SAME'`

padding, the output height and width are computed as:

```
out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))
```

and the padding on the top and left are computed as:

```
pad_along_height = ((out_height - 1) * strides[1] +
filter_height - in_height)
pad_along_width = ((out_width - 1) * strides[2] +
filter_width - in_width)
pad_top = pad_along_height / 2
pad_left = pad_along_width / 2
```

Note that the division by 2 means that there might be cases when the padding on
both sides (top vs bottom, right vs left) are off by one. In this case, the
bottom and right sides always get the one additional padded pixel. For example,
when `pad_along_height`

is 5, we pad 2 pixels at the top and 3 pixels at the
bottom. Note that this is different from existing libraries such as cuDNN and
Caffe, which explicitly specify the number of padded pixels and always pad the
same number of pixels on both sides.

For the `'VALID`

' padding, the output height and width are computed as:

```
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
```

and the padding values are always zero. The output is then computed as

```
output[b, i, j, :] =
sum_{di, dj} input[b, strides[1] * i + di - pad_top,
strides[2] * j + dj - pad_left, ...] *
filter[di, dj, ...]
```

where any value outside the original input image region are considered zero ( i.e. we pad zero values around the border of the image).

Since `input`

is 4-D, each `input[b, i, j, :]`

is a vector. For `conv2d`

, these
vectors are multiplied by the `filter[di, dj, :, :]`

matrices to produce new
vectors. For `depthwise_conv_2d`

, each scalar component `input[b, i, j, k]`

is multiplied by a vector `filter[di, dj, k]`

, and all the vectors are
concatenated.

`tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)`

Computes a 2-D convolution given 4-D `input`

and `filter`

tensors.

Given an input tensor of shape `[batch, in_height, in_width, in_channels]`

and a filter / kernel tensor of shape
`[filter_height, filter_width, in_channels, out_channels]`

, this op
performs the following:

- Flattens the filter to a 2-D matrix with shape
`[filter_height * filter_width * in_channels, output_channels]`

. - Extracts image patches from the input tensor to form a
*virtual*tensor of shape`[batch, out_height, out_width, filter_height * filter_width * in_channels]`

. - For each patch, right-multiplies the filter matrix and the image patch vector.

In detail, with the default NHWC format,

```
output[b, i, j, k] =
sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
filter[di, dj, q, k]
```

Must have `strides[0] = strides[3] = 1`

. For the most common case of the same
horizontal and vertices strides, `strides = [1, stride, stride, 1]`

.

##### Args:

: A`input`

`Tensor`

. Must be one of the following types:`half`

,`float32`

,`float64`

.: A`filter`

`Tensor`

. Must have the same type as`input`

.: A list of`strides`

`ints`

. 1-D of length 4. The stride of the sliding window for each dimension of`input`

. Must be in the same order as the dimension specified with format.: A`padding`

`string`

from:`"SAME", "VALID"`

. The type of padding algorithm to use.: An optional`use_cudnn_on_gpu`

`bool`

. Defaults to`True`

.: An optional`data_format`

`string`

from:`"NHWC", "NCHW"`

. Defaults to`"NHWC"`

. Specify the data format of the input and output data. With the default format "NHWC", the data is stored in the order of: [batch, in_height, in_width, in_channels]. Alternatively, the format could be "NCHW", the data storage order of: [batch, in_channels, in_height, in_width].: A name for the operation (optional).`name`

##### Returns:

A `Tensor`

. Has the same type as `input`

.

`tf.nn.depthwise_conv2d(input, filter, strides, padding, name=None)`

Depthwise 2-D convolution.

Given an input tensor of shape `[batch, in_height, in_width, in_channels]`

and a filter tensor of shape
`[filter_height, filter_width, in_channels, channel_multiplier]`

containing `in_channels`

convolutional filters of depth 1, `depthwise_conv2d`

applies a different filter to each input channel (expanding from 1 channel
to `channel_multiplier`

channels for each), then concatenates the results
together. The output has `in_channels * channel_multiplier`

channels.

In detail,

```
output[b, i, j, k * channel_multiplier + q] =
sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, k] *
filter[di, dj, k, q]
```

Must have `strides[0] = strides[3] = 1`

. For the most common case of the
same horizontal and vertical strides, `strides = [1, stride, stride, 1]`

.

##### Args:

: 4-D with shape`input`

`[batch, in_height, in_width, in_channels]`

.: 4-D with shape`filter`

`[filter_height, filter_width, in_channels, channel_multiplier]`

.: 1-D of size 4. The stride of the sliding window for each dimension of`strides`

`input`

.: A string, either`padding`

`'VALID'`

or`'SAME'`

. The padding algorithm. See the comment here: A name for this operation (optional).`name`

##### Returns:

A 4-D `Tensor`

of shape
`[batch, out_height, out_width, in_channels * channel_multiplier].`

`tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter, strides, padding, name=None)`

2-D convolution with separable filters.

Performs a depthwise convolution that acts separately on channels followed by
a pointwise convolution that mixes channels. Note that this is separability
between dimensions `[1, 2]`

and `3`

, not spatial separability between
dimensions `1`

and `2`

.

In detail,

```
output[b, i, j, k] = sum_{di, dj, q, r]
input[b, strides[1] * i + di, strides[2] * j + dj, q] *
depthwise_filter[di, dj, q, r] *
pointwise_filter[0, 0, q * channel_multiplier + r, k]
```

`strides`

controls the strides for the depthwise convolution only, since
the pointwise convolution has implicit strides of `[1, 1, 1, 1]`

. Must have
`strides[0] = strides[3] = 1`

. For the most common case of the same
horizontal and vertical strides, `strides = [1, stride, stride, 1]`

.

##### Args:

: 4-D`input`

`Tensor`

with shape`[batch, in_height, in_width, in_channels]`

.: 4-D`depthwise_filter`

`Tensor`

with shape`[filter_height, filter_width, in_channels, channel_multiplier]`

. Contains`in_channels`

convolutional filters of depth 1.: 4-D`pointwise_filter`

`Tensor`

with shape`[1, 1, channel_multiplier * in_channels, out_channels]`

. Pointwise filter to mix channels after`depthwise_filter`

has convolved spatially.: 1-D of size 4. The strides for the depthwise convolution for each dimension of`strides`

`input`

.: A string, either`padding`

`'VALID'`

or`'SAME'`

. The padding algorithm. See the comment here: A name for this operation (optional).`name`

##### Returns:

A 4-D `Tensor`

of shape `[batch, out_height, out_width, out_channels]`

.

##### Raises:

: If channel_multiplier * in_channels > out_channels, which means that the separable convolution is overparameterized.`ValueError`

`tf.nn.atrous_conv2d(value, filters, rate, padding, name=None)`

Atrous convolution (a.k.a. convolution with holes or dilated convolution).

Computes a 2-D atrous convolution, also known as convolution with holes or
dilated convolution, given 4-D `value`

and `filters`

tensors. If the `rate`

parameter is equal to one, it performs regular 2-D convolution. If the `rate`

parameter is greater than one, it performs convolution with holes, sampling
the input values every `rate`

pixels in the `height`

and `width`

dimensions.
This is equivalent to convolving the input with a set of upsampled filters,
produced by inserting `rate - 1`

zeros between two consecutive values of the
filters along the `height`

and `width`

dimensions, hence the name atrous
convolution or convolution with holes (the French word trous means holes in
English).

More specifically:

```
output[b, i, j, k] = sum_{di, dj, q} filters[di, dj, q, k] *
value[b, i + rate * di, j + rate * dj, q]
```

Atrous convolution allows us to explicitly control how densely to compute
feature responses in fully convolutional networks. Used in conjunction with
bilinear interpolation, it offers an alternative to `conv2d_transpose`

in
dense prediction tasks such as semantic image segmentation, optical flow
computation, or depth estimation. It also allows us to effectively enlarge
the field of view of filters without increasing the number of parameters or
the amount of computation.

For a description of atrous convolution and how it can be used for dense feature extraction, please see: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. The same operation is investigated further in Multi-Scale Context Aggregation by Dilated Convolutions. Previous works that effectively use atrous convolution in different ways are, among others, OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks and [Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks] (http://arxiv.org/abs/1302.1700). Atrous convolution is also closely related to the so-called noble identities in multi-rate signal processing.

There are many different ways to implement atrous convolution (see the refs above). The implementation here reduces

```
atrous_conv2d(value, filters, rate, padding=padding)
```

to the following three operations:

```
paddings = ...
net = space_to_batch(value, paddings, block_size=rate)
net = conv2d(net, filters, strides=[1, 1, 1, 1], padding="VALID")
crops = ...
net = batch_to_space(net, crops, block_size=rate)
```

Advanced usage. Note the following optimization: A sequence of `atrous_conv2d`

operations with identical `rate`

parameters, 'SAME' `padding`

, and filters
with odd heights/ widths:

```
net = atrous_conv2d(net, filters1, rate, padding="SAME")
net = atrous_conv2d(net, filters2, rate, padding="SAME")
...
net = atrous_conv2d(net, filtersK, rate, padding="SAME")
```

can be equivalently performed cheaper in terms of computation and memory as:

```
pad = ... # padding so that the input dims are multiples of rate
net = space_to_batch(net, paddings=pad, block_size=rate)
net = conv2d(net, filters1, strides=[1, 1, 1, 1], padding="SAME")
net = conv2d(net, filters2, strides=[1, 1, 1, 1], padding="SAME")
...
net = conv2d(net, filtersK, strides=[1, 1, 1, 1], padding="SAME")
net = batch_to_space(net, crops=pad, block_size=rate)
```

because a pair of consecutive `space_to_batch`

and `batch_to_space`

ops with
the same `block_size`

cancel out when their respective `paddings`

and `crops`

inputs are identical.

##### Args:

: A 4-D`value`

`Tensor`

of type`float`

. It needs to be in the default "NHWC" format. Its shape is`[batch, in_height, in_width, in_channels]`

.: A 4-D`filters`

`Tensor`

with the same type as`value`

and shape`[filter_height, filter_width, in_channels, out_channels]`

.`filters`

'`in_channels`

dimension must match that of`value`

. Atrous convolution is equivalent to standard convolution with upsampled filters with effective height`filter_height + (filter_height - 1) * (rate - 1)`

and effective width`filter_width + (filter_width - 1) * (rate - 1)`

, produced by inserting`rate - 1`

zeros along consecutive elements across the`filters`

' spatial dimensions.: A positive int32. The stride with which we sample input values across the`rate`

`height`

and`width`

dimensions. Equivalently, the rate by which we upsample the filter values by inserting zeros across the`height`

and`width`

dimensions. In the literature, the same parameter is sometimes called`input stride`

or`dilation`

.: A string, either`padding`

`'VALID'`

or`'SAME'`

. The padding algorithm.: Optional name for the returned tensor.`name`

##### Returns:

A `Tensor`

with the same type as `value`

.

##### Raises:

: If input/output depth does not match`ValueError`

`filters`

' shape, or if padding is other than`'VALID'`

or`'SAME'`

.

`tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding='SAME', name=None)`

The transpose of `conv2d`

.

This operation is sometimes called "deconvolution" after Deconvolutional
Networks, but is
actually the transpose (gradient) of `conv2d`

rather than an actual
deconvolution.

##### Args:

: A 4-D`value`

`Tensor`

of type`float`

and shape`[batch, height, width, in_channels]`

.: A 4-D`filter`

`Tensor`

with the same type as`value`

and shape`[height, width, output_channels, in_channels]`

.`filter`

's`in_channels`

dimension must match that of`value`

.: A 1-D`output_shape`

`Tensor`

representing the output shape of the deconvolution op.: A list of ints. The stride of the sliding window for each dimension of the input tensor.`strides`

: A string, either`padding`

`'VALID'`

or`'SAME'`

. The padding algorithm. See the comment here: Optional name for the returned tensor.`name`

##### Returns:

A `Tensor`

with the same type as `value`

.

##### Raises:

: If input/output depth does not match`ValueError`

`filter`

's shape, or if padding is other than`'VALID'`

or`'SAME'`

.

`tf.nn.conv3d(input, filter, strides, padding, name=None)`

Computes a 3-D convolution given 5-D `input`

and `filter`

tensors.

In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product.

Our Conv3D implements a form of cross-correlation.

##### Args:

: A`input`

`Tensor`

. Must be one of the following types:`float32`

,`float64`

,`int64`

,`int32`

,`uint8`

,`uint16`

,`int16`

,`int8`

,`complex64`

,`complex128`

,`qint8`

,`quint8`

,`qint32`

,`half`

. Shape`[batch, in_depth, in_height, in_width, in_channels]`

.: A`filter`

`Tensor`

. Must have the same type as`input`

. Shape`[filter_depth, filter_height, filter_width, in_channels, out_channels]`

.`in_channels`

must match between`input`

and`filter`

.: A list of`strides`

`ints`

that has length`>= 5`

. 1-D tensor of length 5. The stride of the sliding window for each dimension of`input`

. Must have`strides[0] = strides[4] = 1`

.: A`padding`

`string`

from:`"SAME", "VALID"`

. The type of padding algorithm to use.: A name for the operation (optional).`name`

##### Returns:

A `Tensor`

. Has the same type as `input`

.