tf.nn.ctc_loss

Computes CTC (Connectionist Temporal Classification) loss.

This op implements the CTC loss as presented in Graves et al., 2006

Connectionist temporal classification (CTC) is a type of neural network output and associated scoring function, for training recurrent neural networks (RNNs) such as LSTM networks to tackle sequence problems where the timing is variable. It can be used for tasks like on-line handwriting recognition or recognizing phones in speech audio. CTC refers to the outputs and scoring, and is independent of the underlying neural network structure.

Notes:

  • This class performs the softmax operation for you, so logits should be e.g. linear projections of outputs by an LSTM.
  • Outputs true repeated classes with blanks in between, and can also output repeated classes with no blanks in between that need to be collapsed by the decoder.
  • labels may be supplied as either a dense, zero-padded Tensor with a vector of label sequence lengths OR as a SparseTensor.
  • On TPU: Only dense padded labels are supported.
  • On CPU and GPU: Caller may use SparseTensor or dense padded labels but calling with a SparseTensor will be significantly faster.
  • Default blank label is 0 instead of num_labels - 1 (where num_labels is the innermost dimension size of logits), unless overridden by blank_index.
tf.random.set_seed(50)
batch_size = 8
num_labels = 6
max_label_length = 5
num_frames = 12
labels = tf.random.uniform([batch_size, max_label_length],
                           minval=1, maxval=num_labels, dtype=tf.int64)
logits = tf.random.uniform([num_frames, batch_size, num_labels])
label_length = tf.random.uniform([batch_size], minval=2,
                                 maxval=max_label_length, dtype=tf.int64)
label_mask = tf.sequence_mask(label_length, maxlen=max_label_length,
                              dtype=label_length.dtype)
labels *= label_mask
logit_length = [num_frames] * batch_size
with tf.GradientTape() as t:
  t.watch(logits)
  ref_loss = tf.nn.ctc_loss(
      labels=labels,
      logits=logits,
      label_length=label_length,
      logit_length=logit_length,
      blank_index=0)
ref_grad = t.gradient(ref_loss, logits)

labels Tensor of shape [batch_size, max_label_seq_length] or SparseTensor.
logits Tensor of shape [frames, batch_size, num_labels]. If logits_time_major == False, shape is [batch_size, frames, num_labels].
label_length Tensor of shape [batch_size]. None, if labels is a SparseTensor. Length of reference label sequence in labels.
logit_length Tensor of shape [batch_size]. Length of input sequence in logits.
logits_time_major (optional) If True (default), logits is shaped [frames, batch_size, num_labels]. If False, shape is [batch_size, frames, num_labels].
unique (optional) Unique label indices as computed by ctc_unique_labels(labels). If supplied, enable a faster, memory efficient implementation on TPU.
blank_index (optional) Set the class index to use for the blank label. Negative values will start from num_labels, ie, -1 will reproduce the ctc_loss behavior of using num_labels - 1 for the blank symbol. There is some memory/performance overhead to switching from the default of 0 as an additional shifted copy of logits may be created.
name A name for this Op. Defaults to "ctc_loss_dense".

loss A 1-D float Tensor of shape [batch_size], containing negative log probabilities.

ValueError Argument blank_index must be provided when labels is a SparseTensor.

Connectionist Temporal Classification - Labeling Unsegmented Sequence Data with Recurrent Neural Networks: Graves et al., 2006 (pdf)

https://en.wikipedia.org/wiki/Connectionist_temporal_classification