tfds.features.text.Tokenizer

View source on GitHub

Splits a string into tokens, and joins them back.

Used in the notebooks

Used in the tutorials

alphanum_only bool, if True, only parse out alphanumeric tokens (non-alphanumeric characters are dropped); otherwise, keep all characters (individual tokens will still be either all alphanumeric or all non-alphanumeric).
reserved_tokens list<str>, a list of strings that, if any are in s, will be preserved as whole tokens, even if they contain mixed alphanumeric/non-alphanumeric characters.

alphanum_only

reserved_tokens

Methods

join

View source

Joins tokens into a string.

load_from_file

View source

save_to_file

View source

tokenize

View source

Splits a string into tokens.