text.FastSentencepieceTokenizer

Sentencepiece tokenizer with tf.text interface.

Methods

detokenize

View source

Detokenizes tokens into preprocessed text.

Args
input A RaggedTensor or Tensor with int32 encoded text with rank >= 1.

Returns
A N-1 dimensional string Tensor or RaggedTensor of the detokenized text.

tokenize

View source

The main tokenization function.

vocab_size

View source

Returns size of the vocabulary in Sentencepiece model.