Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now


View source on GitHub

Adds a Jensen-Shannon divergence to the training procedure.


For brevity, let P = labels, Q = predictions, KL(P||Q) be the Kullback-Leibler divergence. The Jensen-Shannon divergence (JSD) is

M = (P + Q) / 2
JSD(P||Q) = KL(P||M) / 2 + KL(Q||M) / 2

Note, the function assumes that predictions and labels are the values of multinomial distribution, i.e., each value is the probability of the corresponding class.

For the usage of weights and reduction, please refer to tf.losses.


  • labels: Tensor of type float32 or float64, with shape [d1, ..., dN, num_classes], represents target distribution.
  • predictions: Tensor of the same type and shape as labels, represents predicted distribution.
  • axis: The dimension along which the Jensen-Shannon divergence is computed. Note, the values of labels and predictions along the axis should meet the condition of multinomial distribution.
  • weights: (optional) Tensor whose rank is either 0, or the same rank as labels, and must be broadcastable to labels (i.e., all dimensions must be either 1, or the same as the corresponding losses dimension).
  • scope: The scope for the operations performed in computing the loss.
  • loss_collection: collection to which the loss will be added.
  • reduction: Type of reduction to apply to loss.


Weighted loss float Tensor. If reduction is NONE, this has the same shape as labels; otherwise, it is scalar.


  • InvalidArgumentError: If labels or predictions doesn't meet the condition of multinomial distribution.
  • ValueError: If axis is None, or the shape of predictions doesn't match that of labels or if the shape of weights is invalid.