Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

Module: tfds.features.text

View source

Text utilities.

tfds includes a set of TextEncoders as well as a Tokenizer to enable expressive, performant, and reproducible natural language research.


class ByteTextEncoder: Byte-encodes text.

class SubwordTextEncoder: Invertible TextEncoder using word pieces with a byte-level fallback.

class TextEncoder: Abstract base class for converting between text and integers.

class TextEncoderConfig: Configuration for tfds.features.Text.

class TokenTextEncoder: TextEncoder backed by a list of tokens.

class Tokenizer: Splits a string into tokens, and joins them back.