Module: tfds.features.text

Text utilities.

Defined in core/features/text/__init__.py.

tfds includes a set of TextEncoders as well as a Tokenizer to enable expressive, performant, and reproducible natural language research.

Classes

class ByteTextEncoder: Byte-encodes text.

class SubwordTextEncoder: Invertible TextEncoder using word pieces with a byte-level fallback.

class TextEncoder: Abstract base class for converting between text and integers.

class TextEncoderConfig: Configuration for tfds.features.Text.

class Tokenizer: Splits a string into tokens, and joins them back.

class TokenTextEncoder: TextEncoder backed by a list of tokens.