Watch talks from the 2019 TensorFlow Dev Summit Watch now

Module: tfds.features.text

Defined in core/features/text/

Text utilities.

tfds includes a set of TextEncoders as well as a Tokenizer to enable expressive, performant, and reproducible natural language research.


class ByteTextEncoder: Byte-encodes text.

class SubwordTextEncoder: Invertible TextEncoder using word pieces with a byte-level fallback.

class TextEncoder: Abstract base class for converting between text and integers.

class TextEncoderConfig: Configuration for tfds.features.Text.

class Tokenizer: Splits a string into tokens, and joins them back.

class TokenTextEncoder: TextEncoder backed by a list of tokens.