|View source on GitHub|
Adds a KL-divergence to the training procedure.
nsl.lib.kl_divergence( labels, predictions, axis=None, weights=1.0, scope=None, loss_collection=tf.compat.v1.GraphKeys.LOSSES, reduction=tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS )
For brevity, let
P = labels and
Q = predictions. The
KL(P||Q) = P * log(P) - P * log(Q)
For the usage of
reduction, please refer to
float64, with shape
[d1, ..., dN, num_classes], represents the target distribution.
Tensorof the same type and shape as
labels, represents the predicted distribution.
axis: The dimension along which the KL divergence is computed. The values of
axisshould meet the requirements of a multinomial distribution.
Tensorwhose rank is either 0, or the same as that of
labels, and must be broadcastable to
labels(i.e., all dimensions must be either
1, or the same as the corresponding
scope: The scope for the operations performed in computing the loss.
loss_collection: Collection to which the loss will be added.
reduction: Type of reduction to apply to the loss.
NONE, this has the same
labels, otherwise, it is a scalar.
predictionsdon't meet the requirements of a multinomial distribution.
None, if the shape of
predictionsdoesn't match that of
labels, or if the shape of