Attend the Women in ML Symposium on December 7 Register now


Stay organized with collections Save and categorize content based on your preferences.

Implements the focal loss function.

Focal loss was first introduced in the RetinaNet paper ( Focal loss is extremely useful for classification when you have highly imbalanced classes. It down-weights well-classified examples and focuses on hard examples. The loss value is much higher for a sample which is misclassified by the classifier as compared to the loss value corresponding to a well-classified example. One of the best use-cases of focal loss is its usage in object detection where the imbalance between the background class and other classes is extremely high.


fl = tfa.losses.SigmoidFocalCrossEntropy()
loss = fl(
    y_true = [[1.0], [1.0], [0.0]],y_pred = [[0.97], [0.91], [0.03]])
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([6.8532745e-06, 1.9097870e-04, 2.0559824e-05],

Usage with tf.keras API:

model = tf.keras.Model()
model.compile('sgd', loss=tfa.losses.SigmoidFocalCrossEntropy())

alpha balancing factor, default value is 0.25.
gamma modulating factor, default value is 2.0.

Weighted loss float Tensor. If reduction is NONE, this has the same shape as y_true; otherwise, it is scalar.

ValueError If the shape of sample_weight is invalid or value of gamma is less than zero.



Instantiates a Loss from its config (output of get_config()).

config Output of get_config().

A Loss instance.


View source

Returns the config dictionary for a Loss instance.


Invokes the Loss instance.

y_true Ground truth values. shape = [batch_size, d0, .. dN], except sparse loss functions such as sparse categorical crossentropy where shape = [batch_size, d0, .. dN-1]
y_pred The predicted values. shape = [batch_size, d0, .. dN]
sample_weight Optional sample_weight acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If sample_weight is a tensor of size [batch_size], then the total loss for each sample of the batch is rescaled by the corresponding element in the sample_weight vector. If the shape of sample_weight is [batch_size, d0, .. dN-1] (or can be broadcasted to this shape), then each loss element of y_pred is scaled by the corresponding value of sample_weight. (Note ondN-1: all loss functions reduce by 1 dimension, usually axis=-1.)

Weighted loss float Tensor. If reduction is NONE, this has shape [batch_size, d0, .. dN-1]; otherwise, it is scalar. (Note dN-1 because all loss functions reduce by 1 dimension, usually axis=-1.)

ValueError If the shape of sample_weight is invalid.