# Classification

### tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)

Computes sigmoid cross entropy given logits.

Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.

For brevity, let x = logits, z = targets. The logistic loss is

  z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
= z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
= z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
= z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
= (1 - z) * x + log(1 + exp(-x))
= x - x * z + log(1 + exp(-x))


For x < 0, to avoid overflow in exp(-x), we reformulate the above

  x - x * z + log(1 + exp(-x))
= log(exp(x)) - x * z + log(1 + exp(-x))
= - x * z + log(1 + exp(x))


Hence, to ensure stability and avoid overflow, the implementation uses this equivalent formulation

max(x, 0) - x * z + log(1 + exp(-abs(x)))


logits and targets must have the same type and shape.

##### Args:
• logits: A Tensor of type float32 or float64.
• targets: A Tensor of the same type and shape as logits.
• name: A name for the operation (optional).
##### Returns:

A Tensor of the same shape as logits with the componentwise logistic losses.

##### Raises:
• ValueError: If logits and targets do not have the same shape.

### tf.nn.softmax(logits, name=None)

Computes softmax activations.

For each batch i and class j we have

softmax[i, j] = exp(logits[i, j]) / sum_j(exp(logits[i, j]))

##### Args:
• logits: A Tensor. Must be one of the following types: half, float32, float64. 2-D with shape [batch_size, num_classes].
• name: A name for the operation (optional).
##### Returns:

A Tensor. Has the same type as logits. Same shape as logits.

### tf.nn.log_softmax(logits, name=None)

Computes log softmax activations.

For each batch i and class j we have

logsoftmax[i, j] = logits[i, j] - log(sum(exp(logits[i])))

##### Args:
• logits: A Tensor. Must be one of the following types: half, float32, float64. 2-D with shape [batch_size, num_classes].
• name: A name for the operation (optional).
##### Returns:

A Tensor. Has the same type as logits. Same shape as logits.

### tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

Computes softmax cross entropy between logits and labels.

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

logits and labels must have the same shape [batch_size, num_classes] and the same dtype (either float16, float32, or float64).

##### Args:
• logits: Unscaled log probabilities.
• labels: Each row labels[i] must be a valid probability distribution.
• name: A name for the operation (optional).
##### Returns:

A 1-D Tensor of length batch_size of the same type as logits with the softmax cross entropy loss.

### tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels, name=None)

Computes sparse softmax cross entropy between logits and labels.

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the labels vector must provide a single specific index for the true class for each row of logits (each minibatch entry). For soft softmax classification with a probability distribution for each entry, see softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.

##### Args:
• logits: Unscaled log probabilities of rank r and shape [d_0, d_1, ..., d_{r-2}, num_classes] and dtype float32 or float64.
• labels: Tensor of shape [d_0, d_1, ..., d_{r-2}] and dtype int32 or int64. Each entry in labels must be an index in [0, num_classes). Other values will result in a loss of 0, but incorrect gradient computations.
• name: A name for the operation (optional).
##### Returns:

A Tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.

##### Raises:
• ValueError: If logits are scalars (need to have rank >= 1) or if the rank of the labels is not equal to the rank of the labels minus one.

### tf.nn.weighted_cross_entropy_with_logits(logits, targets, pos_weight, name=None)

Computes a weighted cross entropy.

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.

The usual cross-entropy cost is defined as:

targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))

The argument pos_weight is used as a multiplier for the positive targets:

targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))

For brevity, let x = logits, z = targets, q = pos_weight. The loss is:

  qz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
= qz * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
= qz * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
= qz * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
= (1 - z) * x + (qz +  1 - z) * log(1 + exp(-x))
= (1 - z) * x + (1 + (q - 1) * z) * log(1 + exp(-x))


Setting l = (1 + (q - 1) * z), to ensure stability and avoid overflow, the implementation uses

(1 - z) * x + l * (log(1 + exp(-abs(x))) + max(-x, 0))


logits and targets must have the same type and shape.

##### Args:
• logits: A Tensor of type float32 or float64.
• targets: A Tensor of the same type and shape as logits.
• pos_weight: A coefficient to use on the positive examples.
• name: A name for the operation (optional).
##### Returns:

A Tensor of the same shape as logits with the componentwise weightedlogistic losses.

##### Raises:
• ValueError: If logits and targets do not have the same shape.