This package provides several ops that take care of creating variables that are used internally in a consistent way and provide the building blocks for many common machine learning algorithms.

`tf.contrib.layers.avg_pool2d(*args, **kwargs)`

Adds a 2D average pooling op.

It is assumed that the pooling is done per image but not in batch or channels.

##### Args:

: A 4-D tensor of shape`inputs`

`[batch_size, height, width, channels]`

if`data_format`

is`NHWC`

, and`[batch_size, channels, height, width]`

if`data_format`

is`NCHW`

.: A list of length 2: [kernel_height, kernel_width] of the pooling kernel over which the op is computed. Can be an int if both values are the same.`kernel_size`

: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.`stride`

: The padding method, either 'VALID' or 'SAME'.`padding`

: A string.`data_format`

`NHWC`

(default) and`NCHW`

are supported.: The collections to which the outputs are added.`outputs_collections`

: Optional scope for name_scope.`scope`

##### Returns:

A `Tensor`

representing the results of the pooling operation.

##### Raises:

: If`ValueError`

`data_format`

is neither`NHWC`

nor`NCHW`

.

`tf.contrib.layers.batch_norm(*args, **kwargs)`

Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.

"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"

Sergey Ioffe, Christian Szegedy

Can be used as a normalizer function for conv2d and fully_connected.

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) if update_ops: updates = tf.group(*update_ops) total_loss = control_flow_ops.with_dependencies([updates], total_loss)

One can set updates_collections=None to force the updates in place, but that can have speed penalty, especially in distributed settings.

##### Args:

: A tensor with 2 or more dimensions, where the first dimension has`inputs`

`batch_size`

. The normalization is over all but the last dimension if`data_format`

is`NHWC`

and the second dimension if`data_format`

is`NCHW`

.: Decay for the moving average. Reasonable values for`decay`

`decay`

are close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower`decay`

value (recommend trying`decay`

=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.: If True, add offset of`center`

`beta`

to normalized tensor. If False,`beta`

is ignored.: If True, multiply by`scale`

`gamma`

. If False,`gamma`

is not used. When the next layer is linear (also e.g.`nn.relu`

), this can be disabled since the scaling can be done by the next layer.: Small float added to variance to avoid dividing by zero.`epsilon`

: Activation function, default set to None to skip it and maintain a linear activation.`activation_fn`

: Optional initializers for beta, gamma, moving mean and moving variance.`param_initializers`

: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.`updates_collections`

: Whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into`is_training`

`moving_mean`

and`moving_variance`

using an exponential moving average with the given`decay`

. When it is not in training mode then it would use the values of the`moving_mean`

and the`moving_variance`

.: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional collections for the variables.`variables_collections`

: Collections to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see`tf.Variable`

).: An optional tensor of shape`batch_weights`

`[batch_size]`

, containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.): Use nn.fused_batch_norm if True, nn.batch_normalization otherwise.`fused`

: A string.`data_format`

`NHWC`

(default) and`NCHW`

are supported.: Use zero_debias for moving_mean. It creates a new pair of variables 'moving_mean/biased' and 'moving_mean/local_step'.`zero_debias_moving_mean`

: Optional scope for`scope`

`variable_scope`

.

##### Returns:

A `Tensor`

representing the output of the operation.

##### Raises:

: If`ValueError`

`batch_weights`

is not None and`fused`

is True.: If`ValueError`

`data_format`

is neither`NHWC`

nor`NCHW`

.: If the rank of`ValueError`

`inputs`

is undefined.: If rank or channels dimension of`ValueError`

`inputs`

is undefined.

`tf.contrib.layers.convolution2d(*args, **kwargs)`

Adds an N-D convolution followed by an optional batch_norm layer.

It is required that 1 <= N <= 3.

`convolution`

creates a variable called `weights`

, representing the
convolutional kernel, that is convolved (actually cross-correlated) with the
`inputs`

to produce a `Tensor`

of activations. If a `normalizer_fn`

is
provided (such as `batch_norm`

), it is then applied. Otherwise, if
`normalizer_fn`

is None and a `biases_initializer`

is provided then a `biases`

variable would be created and added the activations. Finally, if
`activation_fn`

is not `None`

, it is applied to the activations as well.

Performs a'trous convolution with input stride/dilation rate equal to `rate`

if a value > 1 for any dimension of `rate`

is specified. In this case
`stride`

values != 1 are not supported.

##### Args:

: A Tensor of rank N+2 of shape`inputs`

`[batch_size] + input_spatial_shape + [in_channels]`

if data_format does not start with "NC" (default), or`[batch_size, in_channels] + input_spatial_shape`

if data_format starts with "NC".: Integer, the number of output filters.`num_outputs`

: A sequence of N positive integers specifying the spatial dimensions of of the filters. Can be a single integer to specify the same value for all spatial dimensions.`kernel_size`

: A sequence of N positive integers specifying the stride at which to compute output. Can be a single integer to specify the same value for all spatial dimensions. Specifying any`stride`

`stride`

value != 1 is incompatible with specifying any`rate`

value != 1.: One of`padding`

`"VALID"`

or`"SAME"`

.: A string or None. Specifies whether the channel dimension of the`data_format`

`input`

and output is the last dimension (default, or if`data_format`

does not start with "NC"), or the second dimension (if`data_format`

starts with "NC"). For N=1, the valid values are "NWC" (default) and "NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For N=3, currently the only valid value is "NDHWC".: A sequence of N positive integers specifying the dilation rate to use for a'trous convolution. Can be a single integer to specify the same value for all spatial dimensions. Specifying any`rate`

`rate`

value != 1 is incompatible with specifying any`stride`

value != 1.: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.`activation_fn`

: Normalization function to use instead of`normalizer_fn`

`biases`

. If`normalizer_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added. default set to None for no normalizer function: Normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.`variables_collections`

: Collection to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for`scope`

`variable_scope`

.

##### Returns:

A tensor representing the output of the operation.

##### Raises:

: If`ValueError`

`data_format`

is invalid.: Both 'rate' and`ValueError`

`stride`

are not uniformly 1.

`tf.contrib.layers.conv2d_in_plane(*args, **kwargs)`

Performs the same in-plane convolution to each channel independently.

This is useful for performing various simple channel-independent convolution operations such as image gradients:

image = tf.constant(..., shape=(16, 240, 320, 3)) vert_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[2, 1]) horz_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[1, 2])

##### Args:

: A 4-D tensor with dimensions [batch_size, height, width, channels].`inputs`

: A list of length 2 holding the [kernel_height, kernel_width] of of the pooling. Can be an int if both values are the same.`kernel_size`

: A list of length 2`stride`

`[stride_height, stride_width]`

. Can be an int if both strides are the same. Note that presently both strides must have the same value.: The padding type to use, either 'SAME' or 'VALID'.`padding`

: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.`activation_fn`

: Normalization function to use instead of`normalizer_fn`

`biases`

. If`normalizer_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added. default set to None for no normalizer function: Normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.`variables_collections`

: Collection to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for`scope`

`variable_scope`

.

##### Returns:

A `Tensor`

representing the output of the operation.

`tf.contrib.layers.convolution2d_in_plane(*args, **kwargs)`

Performs the same in-plane convolution to each channel independently.

This is useful for performing various simple channel-independent convolution operations such as image gradients:

image = tf.constant(..., shape=(16, 240, 320, 3)) vert_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[2, 1]) horz_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[1, 2])

##### Args:

: A 4-D tensor with dimensions [batch_size, height, width, channels].`inputs`

: A list of length 2 holding the [kernel_height, kernel_width] of of the pooling. Can be an int if both values are the same.`kernel_size`

: A list of length 2`stride`

`[stride_height, stride_width]`

. Can be an int if both strides are the same. Note that presently both strides must have the same value.: The padding type to use, either 'SAME' or 'VALID'.`padding`

: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.`activation_fn`

: Normalization function to use instead of`normalizer_fn`

`biases`

. If`normalizer_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added. default set to None for no normalizer function: Normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.`variables_collections`

: Collection to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for`scope`

`variable_scope`

.

##### Returns:

A `Tensor`

representing the output of the operation.

`tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding='SAME', data_format='NHWC', name=None)`

The transpose of `conv2d`

.

This operation is sometimes called "deconvolution" after Deconvolutional
Networks, but is
actually the transpose (gradient) of `conv2d`

rather than an actual
deconvolution.

##### Args:

: A 4-D`value`

`Tensor`

of type`float`

and shape`[batch, height, width, in_channels]`

for`NHWC`

data format or`[batch, in_channels, height, width]`

for`NCHW`

data format.: A 4-D`filter`

`Tensor`

with the same type as`value`

and shape`[height, width, output_channels, in_channels]`

.`filter`

's`in_channels`

dimension must match that of`value`

.: A 1-D`output_shape`

`Tensor`

representing the output shape of the deconvolution op.: A list of ints. The stride of the sliding window for each dimension of the input tensor.`strides`

: A string, either`padding`

`'VALID'`

or`'SAME'`

. The padding algorithm. See the comment here: A string. 'NHWC' and 'NCHW' are supported.`data_format`

: Optional name for the returned tensor.`name`

##### Returns:

A `Tensor`

with the same type as `value`

.

##### Raises:

: If input/output depth does not match`ValueError`

`filter`

's shape, or if padding is other than`'VALID'`

or`'SAME'`

.

`tf.contrib.layers.convolution2d_transpose(*args, **kwargs)`

Adds a convolution2d_transpose with an optional batch normalization layer.

The function creates a variable called `weights`

, representing the
kernel, that is convolved with the input. If `batch_norm_params`

is `None`

, a
second variable called 'biases' is added to the result of the operation.

##### Args:

: A 4-D`inputs`

`Tensor`

of type`float`

and shape`[batch, height, width, in_channels]`

for`NHWC`

data format or`[batch, in_channels, height, width]`

for`NCHW`

data format.: Integer, the number of output filters.`num_outputs`

: A list of length 2 holding the [kernel_height, kernel_width] of of the filters. Can be an int if both values are the same.`kernel_size`

: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.`stride`

: One of 'VALID' or 'SAME'.`padding`

: A string.`data_format`

`NHWC`

(default) and`NCHW`

are supported.: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.`activation_fn`

: Normalization function to use instead of`normalizer_fn`

`biases`

. If`normalizer_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added. default set to None for no normalizer function: Normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.`variables_collections`

: Collection to add the outputs.`outputs_collections`

: Whether or not the variables should be trainable or not.`trainable`

: Optional scope for variable_scope.`scope`

##### Returns:

A tensor representing the output of the operation.

##### Raises:

: If 'kernel_size' is not a list of length 2.`ValueError`

: If`ValueError`

`data_format`

is neither`NHWC`

nor`NCHW`

.: If`ValueError`

`C`

dimension of`inputs`

is None.

`tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)`

Computes dropout.

With probability `keep_prob`

, outputs the input element scaled up by
`1 / keep_prob`

, otherwise outputs `0`

. The scaling is so that the expected
sum is unchanged.

By default, each element is kept or dropped independently. If `noise_shape`

is specified, it must be
broadcastable
to the shape of `x`

, and only dimensions with `noise_shape[i] == shape(x)[i]`

will make independent decisions. For example, if `shape(x) = [k, l, m, n]`

and `noise_shape = [k, 1, 1, n]`

, each batch and channel component will be
kept independently and each row and column will be kept or not kept together.

##### Args:

: A tensor.`x`

: A scalar`keep_prob`

`Tensor`

with the same type as x. The probability that each element is kept.: A 1-D`noise_shape`

`Tensor`

of type`int32`

, representing the shape for randomly generated keep/drop flags.: A Python integer. Used to create random seeds. See`seed`

`set_random_seed`

for behavior.: A name for this operation (optional).`name`

##### Returns:

A Tensor of the same shape of `x`

.

##### Raises:

: If`ValueError`

`keep_prob`

is not in`(0, 1]`

.

`tf.contrib.layers.flatten(*args, **kwargs)`

Flattens the input while maintaining the batch_size.

Assumes that the first dimension represents the batch.

##### Args:

: A tensor of size [batch_size, ...].`inputs`

: Collection to add the outputs.`outputs_collections`

: Optional scope for name_scope.`scope`

##### Returns:

A flattened tensor with shape [batch_size, k].

##### Raises:

: If inputs rank is unknown or less than 2.`ValueError`

`tf.contrib.layers.fully_connected(*args, **kwargs)`

Adds a fully connected layer.

`fully_connected`

creates a variable called `weights`

, representing a fully
connected weight matrix, which is multiplied by the `inputs`

to produce a
`Tensor`

of hidden units. If a `normalizer_fn`

is provided (such as
`batch_norm`

), it is then applied. Otherwise, if `normalizer_fn`

is
None and a `biases_initializer`

is provided then a `biases`

variable would be
created and added the hidden units. Finally, if `activation_fn`

is not `None`

,
it is applied to the hidden units as well.

##### Args:

: A tensor of at least rank 2 and static value for the last dimension; i.e.`inputs`

`[batch_size, depth]`

,`[None, None, None, channels]`

.: Integer or long, the number of output units in the layer.`num_outputs`

: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.`activation_fn`

: Normalization function to use instead of`normalizer_fn`

`biases`

. If`normalizer_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added. default set to None for no normalizer function: Normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionary containing a different list of collections per variable.`variables_collections`

: Collection to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for variable_scope.`scope`

##### Returns:

The tensor variable representing the result of the series of operations.

##### Raises:

: If x has rank less than 2 or if its last dimension is not set.`ValueError`

`tf.contrib.layers.layer_norm(*args, **kwargs)`

Adds a Layer Normalization layer from https://arxiv.org/abs/1607.06450.

"Layer Normalization"

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

Can be used as a normalizer function for conv2d and fully_connected.

##### Args:

: A tensor with 2 or more dimensions. The normalization occurs over all but the first dimension.`inputs`

: If True, add offset of`center`

`beta`

to normalized tensor. If False,`beta`

is ignored.: If True, multiply by`scale`

`gamma`

. If False,`gamma`

is not used. When the next layer is linear (also e.g.`nn.relu`

), this can be disabled since the scaling can be done by the next layer.: Activation function, default set to None to skip it and maintain a linear activation.`activation_fn`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional collections for the variables.`variables_collections`

: Collections to add the outputs.`outputs_collections`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see tf.Variable).: Optional scope for`scope`

`variable_scope`

.

##### Returns:

A `Tensor`

representing the output of the operation.

##### Raises:

: If rank or last dimension of`ValueError`

`inputs`

is undefined.

`tf.contrib.layers.linear()`

partial(func, *args, **keywords) - new function with partial application
of the given arguments and keywords.

`tf.contrib.layers.max_pool2d(*args, **kwargs)`

Adds a 2D Max Pooling op.

It is assumed that the pooling is done per image but not in batch or channels.

##### Args:

: A 4-D tensor of shape`inputs`

`[batch_size, height, width, channels]`

if`data_format`

is`NHWC`

, and`[batch_size, channels, height, width]`

if`data_format`

is`NCHW`

.: A list of length 2: [kernel_height, kernel_width] of the pooling kernel over which the op is computed. Can be an int if both values are the same.`kernel_size`

: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.`stride`

: The padding method, either 'VALID' or 'SAME'.`padding`

: A string.`data_format`

`NHWC`

(default) and`NCHW`

are supported.: The collections to which the outputs are added.`outputs_collections`

: Optional scope for name_scope.`scope`

##### Returns:

A `Tensor`

representing the results of the pooling operation.

##### Raises:

: If`ValueError`

`data_format`

is neither`NHWC`

nor`NCHW`

.: If 'kernel_size' is not a 2-D list`ValueError`

`tf.contrib.layers.one_hot_encoding(*args, **kwargs)`

Transform numeric labels into onehot_labels using `tf.one_hot`

.

##### Args:

: [batch_size] target labels.`labels`

: Total number of classes.`num_classes`

: A scalar defining the on-value.`on_value`

: A scalar defining the off-value.`off_value`

: Collection to add the outputs.`outputs_collections`

: Optional scope for name_scope.`scope`

##### Returns:

One-hot encoding of the labels.

`tf.nn.relu(features, name=None)`

Computes rectified linear: `max(features, 0)`

.

##### Args:

: A`features`

`Tensor`

. Must be one of the following types:`float32`

,`float64`

,`int32`

,`int64`

,`uint8`

,`int16`

,`int8`

,`uint16`

,`half`

.: A name for the operation (optional).`name`

##### Returns:

A `Tensor`

. Has the same type as `features`

.

`tf.nn.relu6(features, name=None)`

Computes Rectified Linear 6: `min(max(features, 0), 6)`

.

##### Args:

: A`features`

`Tensor`

with type`float`

,`double`

,`int32`

,`int64`

,`uint8`

,`int16`

, or`int8`

.: A name for the operation (optional).`name`

##### Returns:

A `Tensor`

with the same type as `features`

.

`tf.contrib.layers.repeat(inputs, repetitions, layer, *args, **kwargs)`

Applies the same layer with the same arguments repeatedly.

```
y = repeat(x, 3, conv2d, 64, [3, 3], scope='conv1')
# It is equivalent to:
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_1')
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_2')
y = conv2d(x, 64, [3, 3], scope='conv1/conv1_3')
```

If the `scope`

argument is not given in `kwargs`

, it is set to
`layer.__name__`

, or `layer.func.__name__`

(for `functools.partial`

objects). If neither `__name__`

nor `func.__name__`

is available, the
layers are called with `scope='stack'`

.

##### Args:

: A`inputs`

`Tensor`

suitable for layer.: Int, number of repetitions.`repetitions`

: A layer with arguments`layer`

`(inputs, *args, **kwargs)`

: Extra args for the layer.`*args`

: Extra kwargs for the layer.`**kwargs`

##### Returns:

A tensor result of applying the layer, repetitions times.

##### Raises:

: If the op is unknown or wrong.`ValueError`

`tf.contrib.layers.safe_embedding_lookup_sparse(embedding_weights, sparse_ids, sparse_weights=None, combiner=None, default_id=None, name=None, partition_strategy='div', max_norm=None)`

Lookup embedding results, accounting for invalid IDs and empty features.

The partitioned embedding in `embedding_weights`

must all be the same shape
except for the first dimension. The first dimension is allowed to vary as the
vocabulary size is not necessarily a multiple of `P`

. `embedding_weights`

may be a `PartitionedVariable`

as returned by using `tf.get_variable()`

with a
partitioner.

Invalid IDs (< 0) are pruned from input IDs and weights, as well as any IDs
with non-positive weight. For an entry with no features, the embedding vector
for `default_id`

is returned, or the 0-vector if `default_id`

is not supplied.

The ids and weights may be multi-dimensional. Embeddings are always aggregated along the last dimension.

##### Args:

: A list of`embedding_weights`

`P`

float tensors or values representing partitioned embedding tensors. Alternatively, a`PartitionedVariable`

, created by partitioning along dimension 0. The total unpartitioned shape should be`[e_0, e_1, ..., e_m]`

, where`e_0`

represents the vocab size and`e_1, ..., e_m`

are the embedding dimensions.:`sparse_ids`

`SparseTensor`

of shape`[d_0, d_1, ..., d_n]`

containing the ids.`d_0`

is typically batch size.:`sparse_weights`

`SparseTensor`

of same shape as`sparse_ids`

, containing float weights corresponding to`sparse_ids`

, or`None`

if all weights are be assumed to be 1.0.: A string specifying how to combine embedding results for each entry. Currently "mean", "sqrtn" and "sum" are supported, with "mean" the default.`combiner`

: The id to use for an entry with no features.`default_id`

: A name for this operation (optional).`name`

: A string specifying the partitioning strategy. Currently`partition_strategy`

`"div"`

and`"mod"`

are supported. Default is`"div"`

.: If not None, all embeddings are l2-normalized to max_norm before combining.`max_norm`

##### Returns:

Dense tensor of shape `[d_0, d_1, ..., d_{n-1}, e_1, ..., e_m]`

.

##### Raises:

: if`ValueError`

`embedding_weights`

is empty.

`tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter, strides, padding, rate=None, name=None)`

2-D convolution with separable filters.

Performs a depthwise convolution that acts separately on channels followed by
a pointwise convolution that mixes channels. Note that this is separability
between dimensions `[1, 2]`

and `3`

, not spatial separability between
dimensions `1`

and `2`

.

In detail,

```
output[b, i, j, k] = sum_{di, dj, q, r]
input[b, strides[1] * i + di, strides[2] * j + dj, q] *
depthwise_filter[di, dj, q, r] *
pointwise_filter[0, 0, q * channel_multiplier + r, k]
```

`strides`

controls the strides for the depthwise convolution only, since
the pointwise convolution has implicit strides of `[1, 1, 1, 1]`

. Must have
`strides[0] = strides[3] = 1`

. For the most common case of the same
horizontal and vertical strides, `strides = [1, stride, stride, 1]`

.
If any value in `rate`

is greater than 1, we perform atrous depthwise
convolution, in which case all values in the `strides`

tensor must be equal
to 1.

##### Args:

: 4-D`input`

`Tensor`

with shape`[batch, in_height, in_width, in_channels]`

.: 4-D`depthwise_filter`

`Tensor`

with shape`[filter_height, filter_width, in_channels, channel_multiplier]`

. Contains`in_channels`

convolutional filters of depth 1.: 4-D`pointwise_filter`

`Tensor`

with shape`[1, 1, channel_multiplier * in_channels, out_channels]`

. Pointwise filter to mix channels after`depthwise_filter`

has convolved spatially.: 1-D of size 4. The strides for the depthwise convolution for each dimension of`strides`

`input`

.: A string, either`padding`

`'VALID'`

or`'SAME'`

. The padding algorithm. See the comment here: 1-D of size 2. The dilation rate in which we sample input values across the`rate`

`height`

and`width`

dimensions in atrous convolution. If it is greater than 1, then all values of strides must be 1.: A name for this operation (optional).`name`

##### Returns:

A 4-D `Tensor`

of shape `[batch, out_height, out_width, out_channels]`

.

##### Raises:

: If channel_multiplier * in_channels > out_channels, which means that the separable convolution is overparameterized.`ValueError`

`tf.contrib.layers.separable_convolution2d(*args, **kwargs)`

Adds a depth-separable 2D convolution with optional batch_norm layer.

This op first performs a depthwise convolution that acts separately on
channels, creating a variable called `depthwise_weights`

. If `num_outputs`

is not None, it adds a pointwise convolution that mixes channels, creating a
variable called `pointwise_weights`

. Then, if `batch_norm_params`

is None,
it adds bias to the result, creating a variable called 'biases', otherwise
it adds a batch normalization layer. It finally applies an activation function
to produce the end result.

##### Args:

: A tensor of size [batch_size, height, width, channels].`inputs`

: The number of pointwise convolution output filters. If is None, then we skip the pointwise convolution stage.`num_outputs`

: A list of length 2: [kernel_height, kernel_width] of of the filters. Can be an int if both values are the same.`kernel_size`

: The number of depthwise convolution output channels for each input channel. The total number of depthwise convolution output channels will be equal to`depth_multiplier`

`num_filters_in * depth_multiplier`

.: A list of length 2: [stride_height, stride_width], specifying the depthwise convolution stride. Can be an int if both strides are the same.`stride`

: One of 'VALID' or 'SAME'.`padding`

: A list of length 2: [rate_height, rate_width], specifying the dilation rates for a'trous convolution. Can be an int if both rates are the same. If any value is larger than one, then both stride values need to be one.`rate`

: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.`activation_fn`

: Normalization function to use instead of`normalizer_fn`

`biases`

. If`normalizer_fn`

is provided then`biases_initializer`

and`biases_regularizer`

are ignored and`biases`

are not created nor added. default set to None for no normalizer function: Normalization function parameters.`normalizer_params`

: An initializer for the weights.`weights_initializer`

: Optional regularizer for the weights.`weights_regularizer`

: An initializer for the biases. If None skip biases.`biases_initializer`

: Optional regularizer for the biases.`biases_regularizer`

: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.`reuse`

: Optional list of collections for all the variables or a dictionay containing a different list of collection per variable.`variables_collections`

: Collection to add the outputs.`outputs_collections`

: Whether or not the variables should be trainable or not.`trainable`

: Optional scope for variable_scope.`scope`

##### Returns:

A `Tensor`

representing the output of the operation.

`tf.nn.softmax(logits, dim=-1, name=None)`

Computes softmax activations.

For each batch `i`

and class `j`

we have

```
softmax = exp(logits) / reduce_sum(exp(logits), dim)
```

##### Args:

: A non-empty`logits`

`Tensor`

. Must be one of the following types:`half`

,`float32`

,`float64`

.: The dimension softmax would be performed on. The default is -1 which indicates the last dimension.`dim`

: A name for the operation (optional).`name`

##### Returns:

A `Tensor`

. Has the same type as `logits`

. Same shape as `logits`

.

##### Raises:

: if`InvalidArgumentError`

`logits`

is empty or`dim`

is beyond the last dimension of`logits`

.

`tf.stack(values, axis=0, name='stack')`

Stacks a list of rank-`R`

tensors into one rank-`(R+1)`

tensor.

Packs the list of tensors in `values`

into a tensor with rank one higher than
each tensor in `values`

, by packing them along the `axis`

dimension.
Given a list of length `N`

of tensors of shape `(A, B, C)`

;

if `axis == 0`

then the `output`

tensor will have the shape `(N, A, B, C)`

.
if `axis == 1`

then the `output`

tensor will have the shape `(A, N, B, C)`

.
Etc.

For example:

```
# 'x' is [1, 4]
# 'y' is [2, 5]
# 'z' is [3, 6]
stack([x, y, z]) => [[1, 4], [2, 5], [3, 6]] # Pack along first dim.
stack([x, y, z], axis=1) => [[1, 2, 3], [4, 5, 6]]
```

This is the opposite of unstack. The numpy equivalent is

```
tf.stack([x, y, z]) = np.asarray([x, y, z])
```

##### Args:

: A list of`values`

`Tensor`

objects with the same shape and type.: An`axis`

`int`

. The axis to stack along. Defaults to the first dimension. Supports negative indexes.: A name for this operation (optional).`name`

##### Returns:

: A stacked`output`

`Tensor`

with the same type as`values`

.

##### Raises:

: If`ValueError`

`axis`

is out of the range [-(R+1), R+1).

`tf.contrib.layers.unit_norm(*args, **kwargs)`

Normalizes the given input across the specified dimension to unit length.

Note that the rank of `input`

must be known.

##### Args:

: A`inputs`

`Tensor`

of arbitrary size.: The dimension along which the input is normalized.`dim`

: A small value to add to the inputs to avoid dividing by zero.`epsilon`

: Optional scope for variable_scope.`scope`

##### Returns:

The normalized `Tensor`

.

##### Raises:

: If dim is smaller than the number of dimensions in 'inputs'.`ValueError`

`tf.contrib.layers.embed_sequence(ids, vocab_size=None, embed_dim=None, unique=False, initializer=None, regularizer=None, trainable=True, scope=None, reuse=None)`

Maps a sequence of symbols to a sequence of embeddings.

Typical use case would be reusing embeddings between an encoder and decoder.

##### Args:

:`ids`

`[batch_size, doc_length]`

`Tensor`

of type`int32`

or`int64`

with symbol ids.: Integer number of symbols in vocabulary.`vocab_size`

: Integer number of dimensions for embedding matrix.`embed_dim`

: If`unique`

`True`

, will first compute the unique set of indices, and then lookup each embedding once, repeating them in the output as needed.: An initializer for the embeddings, if`initializer`

`None`

default for current scope is used.: Optional regularizer for the embeddings.`regularizer`

: If`trainable`

`True`

also add variables to the graph collection`GraphKeys.TRAINABLE_VARIABLES`

(see`tf.Variable`

).: Optional string specifying the variable scope for the op, required if`scope`

`reuse=True`

.: If`reuse`

`True`

, variables inside the op will be reused.

##### Returns:

`Tensor`

of `[batch_size, doc_length, embed_dim]`

with embedded sequences.

##### Raises:

: if`ValueError`

`embed_dim`

or`vocab_size`

are not specified when not`reuse`

is`None`

or`False`

.

Aliases for fully_connected which set a default activation function are
available: `relu`

, `relu6`

and `linear`

.

`stack`

operation is also available. It builds a stack of layers by applying
a layer repeatedly.