View source on GitHub |
Sentencepiece tokenizer with tf.text interface.
text.FastSentencepieceTokenizer(
model, reverse=False, add_bos=False, add_eos=False
)
Methods
detokenize
detokenize(
input
)
Detokenizes tokens into preprocessed text.
Args | |
---|---|
input
|
A RaggedTensor or Tensor with int32 encoded text with rank >=
1.
|
Returns | |
---|---|
A N-1 dimensional string Tensor or RaggedTensor of the detokenized text. |
tokenize
tokenize(
inputs
)
The main tokenization function.
vocab_size
vocab_size()
Returns size of the vocabulary in Sentencepiece model.