# Higher level ops for building neural network layers.

This package provides several ops that take care of creating variables that are used internally in a consistent way and provide the building blocks for many common machine learning algorithms.

### tf.contrib.layers.avg_pool2d(*args, **kwargs)

Adds a 2D average pooling op.

It is assumed that the pooling is done per image but not in batch or channels.

##### Args:
• inputs: A 4-D tensor of shape [batch_size, height, width, channels] if data_format is NHWC, and [batch_size, channels, height, width] if data_format is NCHW.
• kernel_size: A list of length 2: [kernel_height, kernel_width] of the pooling kernel over which the op is computed. Can be an int if both values are the same.
• stride: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.
• padding: The padding method, either 'VALID' or 'SAME'.
• data_format: A string. NHWC (default) and NCHW are supported.
• outputs_collections: The collections to which the outputs are added.
• scope: Optional scope for name_scope.
##### Returns:

A Tensor representing the results of the pooling operation.

##### Raises:
• ValueError: If data_format is neither NHWC nor NCHW.

### tf.contrib.layers.batch_norm(*args, **kwargs)

Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.

"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"

Sergey Ioffe, Christian Szegedy

Can be used as a normalizer function for conv2d and fully_connected.

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) if update_ops: updates = tf.group(*update_ops) total_loss = control_flow_ops.with_dependencies([updates], total_loss)

One can set updates_collections=None to force the updates in place, but that can have speed penalty, especially in distributed settings.

##### Args:
• inputs: A tensor with 2 or more dimensions, where the first dimension has batch_size. The normalization is over all but the last dimension if data_format is NHWC and the second dimension if data_format is NCHW.
• decay: Decay for the moving average. Reasonable values for decay are close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower decay value (recommend trying decay=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.
• center: If True, add offset of beta to normalized tensor. If False, beta is ignored.
• scale: If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling can be done by the next layer.
• epsilon: Small float added to variance to avoid dividing by zero.
• activation_fn: Activation function, default set to None to skip it and maintain a linear activation.
• param_initializers: Optional initializers for beta, gamma, moving mean and moving variance.
• updates_collections: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.
• is_training: Whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into moving_mean and moving_variance using an exponential moving average with the given decay. When it is not in training mode then it would use the values of the moving_mean and the moving_variance.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional collections for the variables.
• outputs_collections: Collections to add the outputs.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• batch_weights: An optional tensor of shape [batch_size], containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)
• fused: Use nn.fused_batch_norm if True, nn.batch_normalization otherwise.
• data_format: A string. NHWC (default) and NCHW are supported.
• zero_debias_moving_mean: Use zero_debias for moving_mean. It creates a new pair of variables 'moving_mean/biased' and 'moving_mean/local_step'.
• scope: Optional scope for variable_scope.
##### Returns:

A Tensor representing the output of the operation.

##### Raises:
• ValueError: If batch_weights is not None and fused is True.
• ValueError: If data_format is neither NHWC nor NCHW.
• ValueError: If the rank of inputs is undefined.
• ValueError: If rank or channels dimension of inputs is undefined.

### tf.contrib.layers.convolution2d(*args, **kwargs)

Adds an N-D convolution followed by an optional batch_norm layer.

It is required that 1 <= N <= 3.

convolution creates a variable called weights, representing the convolutional kernel, that is convolved (actually cross-correlated) with the inputs to produce a Tensor of activations. If a normalizer_fn is provided (such as batch_norm), it is then applied. Otherwise, if normalizer_fn is None and a biases_initializer is provided then a biases variable would be created and added the activations. Finally, if activation_fn is not None, it is applied to the activations as well.

Performs a'trous convolution with input stride/dilation rate equal to rate if a value > 1 for any dimension of rate is specified. In this case stride values != 1 are not supported.

##### Args:
• inputs: A Tensor of rank N+2 of shape [batch_size] + input_spatial_shape + [in_channels] if data_format does not start with "NC" (default), or [batch_size, in_channels] + input_spatial_shape if data_format starts with "NC".
• num_outputs: Integer, the number of output filters.
• kernel_size: A sequence of N positive integers specifying the spatial dimensions of of the filters. Can be a single integer to specify the same value for all spatial dimensions.
• stride: A sequence of N positive integers specifying the stride at which to compute output. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any rate value != 1.
• padding: One of "VALID" or "SAME".
• data_format: A string or None. Specifies whether the channel dimension of the input and output is the last dimension (default, or if data_format does not start with "NC"), or the second dimension (if data_format starts with "NC"). For N=1, the valid values are "NWC" (default) and "NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For N=3, currently the only valid value is "NDHWC".
• rate: A sequence of N positive integers specifying the dilation rate to use for a'trous convolution. Can be a single integer to specify the same value for all spatial dimensions. Specifying any rate value != 1 is incompatible with specifying any stride value != 1.
• activation_fn: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.
• normalizer_fn: Normalization function to use instead of biases. If normalizer_fn is provided then biases_initializer and biases_regularizer are ignored and biases are not created nor added. default set to None for no normalizer function
• normalizer_params: Normalization function parameters.
• weights_initializer: An initializer for the weights.
• weights_regularizer: Optional regularizer for the weights.
• biases_initializer: An initializer for the biases. If None skip biases.
• biases_regularizer: Optional regularizer for the biases.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.
• outputs_collections: Collection to add the outputs.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• scope: Optional scope for variable_scope.
##### Returns:

A tensor representing the output of the operation.

##### Raises:
• ValueError: If data_format is invalid.
• ValueError: Both 'rate' and stride are not uniformly 1.

### tf.contrib.layers.conv2d_in_plane(*args, **kwargs)

Performs the same in-plane convolution to each channel independently.

This is useful for performing various simple channel-independent convolution operations such as image gradients:

image = tf.constant(..., shape=(16, 240, 320, 3)) vert_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[2, 1]) horz_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[1, 2])

##### Args:
• inputs: A 4-D tensor with dimensions [batch_size, height, width, channels].
• kernel_size: A list of length 2 holding the [kernel_height, kernel_width] of of the pooling. Can be an int if both values are the same.
• stride: A list of length 2 [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.
• padding: The padding type to use, either 'SAME' or 'VALID'.
• activation_fn: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.
• normalizer_fn: Normalization function to use instead of biases. If normalizer_fn is provided then biases_initializer and biases_regularizer are ignored and biases are not created nor added. default set to None for no normalizer function
• normalizer_params: Normalization function parameters.
• weights_initializer: An initializer for the weights.
• weights_regularizer: Optional regularizer for the weights.
• biases_initializer: An initializer for the biases. If None skip biases.
• biases_regularizer: Optional regularizer for the biases.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.
• outputs_collections: Collection to add the outputs.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• scope: Optional scope for variable_scope.
##### Returns:

A Tensor representing the output of the operation.

### tf.contrib.layers.convolution2d_in_plane(*args, **kwargs)

Performs the same in-plane convolution to each channel independently.

This is useful for performing various simple channel-independent convolution operations such as image gradients:

image = tf.constant(..., shape=(16, 240, 320, 3)) vert_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[2, 1]) horz_gradients = layers.conv2d_in_plane(image, kernel=[1, -1], kernel_size=[1, 2])

##### Args:
• inputs: A 4-D tensor with dimensions [batch_size, height, width, channels].
• kernel_size: A list of length 2 holding the [kernel_height, kernel_width] of of the pooling. Can be an int if both values are the same.
• stride: A list of length 2 [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.
• padding: The padding type to use, either 'SAME' or 'VALID'.
• activation_fn: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.
• normalizer_fn: Normalization function to use instead of biases. If normalizer_fn is provided then biases_initializer and biases_regularizer are ignored and biases are not created nor added. default set to None for no normalizer function
• normalizer_params: Normalization function parameters.
• weights_initializer: An initializer for the weights.
• weights_regularizer: Optional regularizer for the weights.
• biases_initializer: An initializer for the biases. If None skip biases.
• biases_regularizer: Optional regularizer for the biases.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.
• outputs_collections: Collection to add the outputs.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• scope: Optional scope for variable_scope.
##### Returns:

A Tensor representing the output of the operation.

### tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding='SAME', data_format='NHWC', name=None)

The transpose of conv2d.

This operation is sometimes called "deconvolution" after Deconvolutional Networks, but is actually the transpose (gradient) of conv2d rather than an actual deconvolution.

##### Args:
• value: A 4-D Tensor of type float and shape [batch, height, width, in_channels] for NHWC data format or [batch, in_channels, height, width] for NCHW data format.
• filter: A 4-D Tensor with the same type as value and shape [height, width, output_channels, in_channels]. filter's in_channels dimension must match that of value.
• output_shape: A 1-D Tensor representing the output shape of the deconvolution op.
• strides: A list of ints. The stride of the sliding window for each dimension of the input tensor.
• padding: A string, either 'VALID' or 'SAME'. The padding algorithm. See the comment here
• data_format: A string. 'NHWC' and 'NCHW' are supported.
• name: Optional name for the returned tensor.
##### Returns:

A Tensor with the same type as value.

##### Raises:
• ValueError: If input/output depth does not match filter's shape, or if padding is other than 'VALID' or 'SAME'.

### tf.contrib.layers.convolution2d_transpose(*args, **kwargs)

Adds a convolution2d_transpose with an optional batch normalization layer.

The function creates a variable called weights, representing the kernel, that is convolved with the input. If batch_norm_params is None, a second variable called 'biases' is added to the result of the operation.

##### Args:
• inputs: A 4-D Tensor of type float and shape [batch, height, width, in_channels] for NHWC data format or [batch, in_channels, height, width] for NCHW data format.
• num_outputs: Integer, the number of output filters.
• kernel_size: A list of length 2 holding the [kernel_height, kernel_width] of of the filters. Can be an int if both values are the same.
• stride: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.
• padding: One of 'VALID' or 'SAME'.
• data_format: A string. NHWC (default) and NCHW are supported.
• activation_fn: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.
• normalizer_fn: Normalization function to use instead of biases. If normalizer_fn is provided then biases_initializer and biases_regularizer are ignored and biases are not created nor added. default set to None for no normalizer function
• normalizer_params: Normalization function parameters.
• weights_initializer: An initializer for the weights.
• weights_regularizer: Optional regularizer for the weights.
• biases_initializer: An initializer for the biases. If None skip biases.
• biases_regularizer: Optional regularizer for the biases.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional list of collections for all the variables or a dictionary containing a different list of collection per variable.
• outputs_collections: Collection to add the outputs.
• trainable: Whether or not the variables should be trainable or not.
• scope: Optional scope for variable_scope.
##### Returns:

A tensor representing the output of the operation.

##### Raises:
• ValueError: If 'kernel_size' is not a list of length 2.
• ValueError: If data_format is neither NHWC nor NCHW.
• ValueError: If C dimension of inputs is None.

### tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)

Computes dropout.

With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.

By default, each element is kept or dropped independently. If noise_shape is specified, it must be broadcastable to the shape of x, and only dimensions with noise_shape[i] == shape(x)[i] will make independent decisions. For example, if shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n], each batch and channel component will be kept independently and each row and column will be kept or not kept together.

##### Args:
• x: A tensor.
• keep_prob: A scalar Tensor with the same type as x. The probability that each element is kept.
• noise_shape: A 1-D Tensor of type int32, representing the shape for randomly generated keep/drop flags.
• seed: A Python integer. Used to create random seeds. See set_random_seed for behavior.
• name: A name for this operation (optional).
##### Returns:

A Tensor of the same shape of x.

##### Raises:
• ValueError: If keep_prob is not in (0, 1].

### tf.contrib.layers.flatten(*args, **kwargs)

Flattens the input while maintaining the batch_size.

Assumes that the first dimension represents the batch.

##### Args:
• inputs: A tensor of size [batch_size, ...].
• outputs_collections: Collection to add the outputs.
• scope: Optional scope for name_scope.
##### Returns:

A flattened tensor with shape [batch_size, k].

##### Raises:
• ValueError: If inputs rank is unknown or less than 2.

### tf.contrib.layers.fully_connected(*args, **kwargs)

fully_connected creates a variable called weights, representing a fully connected weight matrix, which is multiplied by the inputs to produce a Tensor of hidden units. If a normalizer_fn is provided (such as batch_norm), it is then applied. Otherwise, if normalizer_fn is None and a biases_initializer is provided then a biases variable would be created and added the hidden units. Finally, if activation_fn is not None, it is applied to the hidden units as well.

##### Args:
• inputs: A tensor of at least rank 2 and static value for the last dimension; i.e. [batch_size, depth], [None, None, None, channels].
• num_outputs: Integer or long, the number of output units in the layer.
• activation_fn: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.
• normalizer_fn: Normalization function to use instead of biases. If normalizer_fn is provided then biases_initializer and biases_regularizer are ignored and biases are not created nor added. default set to None for no normalizer function
• normalizer_params: Normalization function parameters.
• weights_initializer: An initializer for the weights.
• weights_regularizer: Optional regularizer for the weights.
• biases_initializer: An initializer for the biases. If None skip biases.
• biases_regularizer: Optional regularizer for the biases.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional list of collections for all the variables or a dictionary containing a different list of collections per variable.
• outputs_collections: Collection to add the outputs.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• scope: Optional scope for variable_scope.
##### Returns:

The tensor variable representing the result of the series of operations.

##### Raises:
• ValueError: If x has rank less than 2 or if its last dimension is not set.

### tf.contrib.layers.layer_norm(*args, **kwargs)

Adds a Layer Normalization layer from https://arxiv.org/abs/1607.06450.

"Layer Normalization"

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

Can be used as a normalizer function for conv2d and fully_connected.

##### Args:
• inputs: A tensor with 2 or more dimensions. The normalization occurs over all but the first dimension.
• center: If True, add offset of beta to normalized tensor. If False, beta is ignored.
• scale: If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling can be done by the next layer.
• activation_fn: Activation function, default set to None to skip it and maintain a linear activation.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional collections for the variables.
• outputs_collections: Collections to add the outputs.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• scope: Optional scope for variable_scope.
##### Returns:

A Tensor representing the output of the operation.

##### Raises:
• ValueError: If rank or last dimension of inputs is undefined.

### tf.contrib.layers.linear()

partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.

### tf.contrib.layers.max_pool2d(*args, **kwargs)

Adds a 2D Max Pooling op.

It is assumed that the pooling is done per image but not in batch or channels.

##### Args:
• inputs: A 4-D tensor of shape [batch_size, height, width, channels] if data_format is NHWC, and [batch_size, channels, height, width] if data_format is NCHW.
• kernel_size: A list of length 2: [kernel_height, kernel_width] of the pooling kernel over which the op is computed. Can be an int if both values are the same.
• stride: A list of length 2: [stride_height, stride_width]. Can be an int if both strides are the same. Note that presently both strides must have the same value.
• padding: The padding method, either 'VALID' or 'SAME'.
• data_format: A string. NHWC (default) and NCHW are supported.
• outputs_collections: The collections to which the outputs are added.
• scope: Optional scope for name_scope.
##### Returns:

A Tensor representing the results of the pooling operation.

##### Raises:
• ValueError: If data_format is neither NHWC nor NCHW.
• ValueError: If 'kernel_size' is not a 2-D list

### tf.contrib.layers.one_hot_encoding(*args, **kwargs)

Transform numeric labels into onehot_labels using tf.one_hot.

##### Args:
• labels: [batch_size] target labels.
• num_classes: Total number of classes.
• on_value: A scalar defining the on-value.
• off_value: A scalar defining the off-value.
• outputs_collections: Collection to add the outputs.
• scope: Optional scope for name_scope.
##### Returns:

One-hot encoding of the labels.

### tf.nn.relu(features, name=None)

Computes rectified linear: max(features, 0).

##### Args:
• features: A Tensor. Must be one of the following types: float32, float64, int32, int64, uint8, int16, int8, uint16, half.
• name: A name for the operation (optional).
##### Returns:

A Tensor. Has the same type as features.

### tf.nn.relu6(features, name=None)

Computes Rectified Linear 6: min(max(features, 0), 6).

##### Args:
• features: A Tensor with type float, double, int32, int64, uint8, int16, or int8.
• name: A name for the operation (optional).
##### Returns:

A Tensor with the same type as features.

### tf.contrib.layers.repeat(inputs, repetitions, layer, *args, **kwargs)

Applies the same layer with the same arguments repeatedly.

  y = repeat(x, 3, conv2d, 64, [3, 3], scope='conv1')
# It is equivalent to:

x = conv2d(x, 64, [3, 3], scope='conv1/conv1_1')
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_2')
y = conv2d(x, 64, [3, 3], scope='conv1/conv1_3')


If the scope argument is not given in kwargs, it is set to layer.__name__, or layer.func.__name__ (for functools.partial objects). If neither __name__ nor func.__name__ is available, the layers are called with scope='stack'.

##### Args:
• inputs: A Tensor suitable for layer.
• repetitions: Int, number of repetitions.
• layer: A layer with arguments (inputs, *args, **kwargs)
• *args: Extra args for the layer.
• **kwargs: Extra kwargs for the layer.
##### Returns:

A tensor result of applying the layer, repetitions times.

##### Raises:
• ValueError: If the op is unknown or wrong.

### tf.contrib.layers.safe_embedding_lookup_sparse(embedding_weights, sparse_ids, sparse_weights=None, combiner=None, default_id=None, name=None, partition_strategy='div', max_norm=None)

Lookup embedding results, accounting for invalid IDs and empty features.

The partitioned embedding in embedding_weights must all be the same shape except for the first dimension. The first dimension is allowed to vary as the vocabulary size is not necessarily a multiple of P. embedding_weights may be a PartitionedVariable as returned by using tf.get_variable() with a partitioner.

Invalid IDs (< 0) are pruned from input IDs and weights, as well as any IDs with non-positive weight. For an entry with no features, the embedding vector for default_id is returned, or the 0-vector if default_id is not supplied.

The ids and weights may be multi-dimensional. Embeddings are always aggregated along the last dimension.

##### Args:
• embedding_weights: A list of P float tensors or values representing partitioned embedding tensors. Alternatively, a PartitionedVariable, created by partitioning along dimension 0. The total unpartitioned shape should be [e_0, e_1, ..., e_m], where e_0 represents the vocab size and e_1, ..., e_m are the embedding dimensions.
• sparse_ids: SparseTensor of shape [d_0, d_1, ..., d_n] containing the ids. d_0 is typically batch size.
• sparse_weights: SparseTensor of same shape as sparse_ids, containing float weights corresponding to sparse_ids, or None if all weights are be assumed to be 1.0.
• combiner: A string specifying how to combine embedding results for each entry. Currently "mean", "sqrtn" and "sum" are supported, with "mean" the default.
• default_id: The id to use for an entry with no features.
• name: A name for this operation (optional).
• partition_strategy: A string specifying the partitioning strategy. Currently "div" and "mod" are supported. Default is "div".
• max_norm: If not None, all embeddings are l2-normalized to max_norm before combining.
##### Returns:

Dense tensor of shape [d_0, d_1, ..., d_{n-1}, e_1, ..., e_m].

##### Raises:
• ValueError: if embedding_weights is empty.

### tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter, strides, padding, rate=None, name=None)

2-D convolution with separable filters.

Performs a depthwise convolution that acts separately on channels followed by a pointwise convolution that mixes channels. Note that this is separability between dimensions [1, 2] and 3, not spatial separability between dimensions 1 and 2.

In detail,

output[b, i, j, k] = sum_{di, dj, q, r]
input[b, strides[1] * i + di, strides[2] * j + dj, q] *
depthwise_filter[di, dj, q, r] *
pointwise_filter[0, 0, q * channel_multiplier + r, k]


strides controls the strides for the depthwise convolution only, since the pointwise convolution has implicit strides of [1, 1, 1, 1]. Must have strides[0] = strides[3] = 1. For the most common case of the same horizontal and vertical strides, strides = [1, stride, stride, 1]. If any value in rate is greater than 1, we perform atrous depthwise convolution, in which case all values in the strides tensor must be equal to 1.

##### Args:
• input: 4-D Tensor with shape [batch, in_height, in_width, in_channels].
• depthwise_filter: 4-D Tensor with shape [filter_height, filter_width, in_channels, channel_multiplier]. Contains in_channels convolutional filters of depth 1.
• pointwise_filter: 4-D Tensor with shape [1, 1, channel_multiplier * in_channels, out_channels]. Pointwise filter to mix channels after depthwise_filter has convolved spatially.
• strides: 1-D of size 4. The strides for the depthwise convolution for each dimension of input.
• padding: A string, either 'VALID' or 'SAME'. The padding algorithm. See the comment here
• rate: 1-D of size 2. The dilation rate in which we sample input values across the height and width dimensions in atrous convolution. If it is greater than 1, then all values of strides must be 1.
• name: A name for this operation (optional).
##### Returns:

A 4-D Tensor of shape [batch, out_height, out_width, out_channels].

##### Raises:
• ValueError: If channel_multiplier * in_channels > out_channels, which means that the separable convolution is overparameterized.

### tf.contrib.layers.separable_convolution2d(*args, **kwargs)

Adds a depth-separable 2D convolution with optional batch_norm layer.

This op first performs a depthwise convolution that acts separately on channels, creating a variable called depthwise_weights. If num_outputs is not None, it adds a pointwise convolution that mixes channels, creating a variable called pointwise_weights. Then, if batch_norm_params is None, it adds bias to the result, creating a variable called 'biases', otherwise it adds a batch normalization layer. It finally applies an activation function to produce the end result.

##### Args:
• inputs: A tensor of size [batch_size, height, width, channels].
• num_outputs: The number of pointwise convolution output filters. If is None, then we skip the pointwise convolution stage.
• kernel_size: A list of length 2: [kernel_height, kernel_width] of of the filters. Can be an int if both values are the same.
• depth_multiplier: The number of depthwise convolution output channels for each input channel. The total number of depthwise convolution output channels will be equal to num_filters_in * depth_multiplier.
• stride: A list of length 2: [stride_height, stride_width], specifying the depthwise convolution stride. Can be an int if both strides are the same.
• padding: One of 'VALID' or 'SAME'.
• rate: A list of length 2: [rate_height, rate_width], specifying the dilation rates for a'trous convolution. Can be an int if both rates are the same. If any value is larger than one, then both stride values need to be one.
• activation_fn: Activation function. The default value is a ReLU function. Explicitly set it to None to skip it and maintain a linear activation.
• normalizer_fn: Normalization function to use instead of biases. If normalizer_fn is provided then biases_initializer and biases_regularizer are ignored and biases are not created nor added. default set to None for no normalizer function
• normalizer_params: Normalization function parameters.
• weights_initializer: An initializer for the weights.
• weights_regularizer: Optional regularizer for the weights.
• biases_initializer: An initializer for the biases. If None skip biases.
• biases_regularizer: Optional regularizer for the biases.
• reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
• variables_collections: Optional list of collections for all the variables or a dictionay containing a different list of collection per variable.
• outputs_collections: Collection to add the outputs.
• trainable: Whether or not the variables should be trainable or not.
• scope: Optional scope for variable_scope.
##### Returns:

A Tensor representing the output of the operation.

### tf.nn.softmax(logits, dim=-1, name=None)

Computes softmax activations.

For each batch i and class j we have

softmax = exp(logits) / reduce_sum(exp(logits), dim)

##### Args:
• logits: A non-empty Tensor. Must be one of the following types: half, float32, float64.
• dim: The dimension softmax would be performed on. The default is -1 which indicates the last dimension.
• name: A name for the operation (optional).
##### Returns:

A Tensor. Has the same type as logits. Same shape as logits.

##### Raises:
• InvalidArgumentError: if logits is empty or dim is beyond the last dimension of logits.

### tf.stack(values, axis=0, name='stack')

Stacks a list of rank-R tensors into one rank-(R+1) tensor.

Packs the list of tensors in values into a tensor with rank one higher than each tensor in values, by packing them along the axis dimension. Given a list of length N of tensors of shape (A, B, C);

if axis == 0 then the output tensor will have the shape (N, A, B, C). if axis == 1 then the output tensor will have the shape (A, N, B, C). Etc.

For example:

# 'x' is [1, 4]
# 'y' is [2, 5]
# 'z' is [3, 6]
stack([x, y, z]) => [[1, 4], [2, 5], [3, 6]]  # Pack along first dim.
stack([x, y, z], axis=1) => [[1, 2, 3], [4, 5, 6]]


This is the opposite of unstack. The numpy equivalent is

tf.stack([x, y, z]) = np.asarray([x, y, z])

##### Args:
• values: A list of Tensor objects with the same shape and type.
• axis: An int. The axis to stack along. Defaults to the first dimension. Supports negative indexes.
• name: A name for this operation (optional).
##### Returns:
• output: A stacked Tensor with the same type as values.
##### Raises:
• ValueError: If axis is out of the range [-(R+1), R+1).

### tf.contrib.layers.unit_norm(*args, **kwargs)

Normalizes the given input across the specified dimension to unit length.

Note that the rank of input must be known.

##### Args:
• inputs: A Tensor of arbitrary size.
• dim: The dimension along which the input is normalized.
• epsilon: A small value to add to the inputs to avoid dividing by zero.
• scope: Optional scope for variable_scope.
##### Returns:

The normalized Tensor.

##### Raises:
• ValueError: If dim is smaller than the number of dimensions in 'inputs'.

### tf.contrib.layers.embed_sequence(ids, vocab_size=None, embed_dim=None, unique=False, initializer=None, regularizer=None, trainable=True, scope=None, reuse=None)

Maps a sequence of symbols to a sequence of embeddings.

Typical use case would be reusing embeddings between an encoder and decoder.

##### Args:
• ids: [batch_size, doc_length] Tensor of type int32 or int64 with symbol ids.
• vocab_size: Integer number of symbols in vocabulary.
• embed_dim: Integer number of dimensions for embedding matrix.
• unique: If True, will first compute the unique set of indices, and then lookup each embedding once, repeating them in the output as needed.
• initializer: An initializer for the embeddings, if None default for current scope is used.
• regularizer: Optional regularizer for the embeddings.
• trainable: If True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
• scope: Optional string specifying the variable scope for the op, required if reuse=True.
• reuse: If True, variables inside the op will be reused.
##### Returns:

Tensor of [batch_size, doc_length, embed_dim] with embedded sequences.

##### Raises:
• ValueError: if embed_dim or vocab_size are not specified when not reuse is None or False.

Aliases for fully_connected which set a default activation function are available: relu, relu6 and linear.

stack operation is also available. It builds a stack of layers by applying a layer repeatedly.