tfds.features.text.TextEncoder

View source on GitHub

Class TextEncoder

Abstract base class for converting between text and integers.

A note on padding:

Because text data is typically variable length and nearly always requires padding during training, ID 0 is always reserved for padding. To accommodate this, all TextEncoders behave in certain ways:

  • encode: never returns id 0 (all ids are 1+)
  • decode: drops 0 in the input ids
  • vocab_size: includes ID 0

    New subclasses should be careful to match this behavior.

Properties

vocab_size

Size of the vocabulary. Decode produces ints [1, vocab_size).

Methods

decode

View source

decode(ids)

Decodes a list of integers into text.

encode

View source

encode(s)

Encodes text into a list of integers.

load_from_file

View source

@classmethod
load_from_file(
    cls,
    filename_prefix
)

Load from file. Inverse of save_to_file.

save_to_file

View source

save_to_file(filename_prefix)

Store to file. Inverse of load_from_file.