Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfds.features.text.Tokenizer

View source on GitHub

Splits a string into tokens, and joins them back.

tfds.features.text.Tokenizer(
    alphanum_only=True, reserved_tokens=None
)

Used in the notebooks

Used in the tutorials

Args:

  • alphanum_only: bool, if True, only parse out alphanumeric tokens (non-alphanumeric characters are dropped); otherwise, keep all characters (individual tokens will still be either all alphanumeric or all non-alphanumeric).
  • reserved_tokens: list<str>, a list of strings that, if any are in s, will be preserved as whole tokens, even if they contain mixed alphanumeric/non-alphanumeric characters.

Attributes:

  • alphanum_only
  • reserved_tokens

Methods

join

View source

join(
    tokens
)

Joins tokens into a string.

load_from_file

View source

@classmethod
load_from_file(
    cls, filename_prefix
)

save_to_file

View source

save_to_file(
    filename_prefix
)

tokenize

View source

tokenize(
    s
)

Splits a string into tokens.