Attend the Women in ML Symposium on December 7 Register now

text.FastSentencepieceTokenizer

Stay organized with collections Save and categorize content based on your preferences.

Sentencepiece tokenizer with tf.text interface.

Methods

detokenize

View source

Detokenizes tokens into preprocessed text.

Args
input A RaggedTensor or Tensor with int32 encoded text with rank >= 1.

Returns
A N-1 dimensional string Tensor or RaggedTensor of the detokenized text.

tokenize

View source

The main tokenization function.

vocab_size

View source

Returns size of the vocabulary in Sentencepiece model.