![]() |
Class Tokenizer
Splits a string into tokens, and joins them back.
Used in the tutorials:
__init__
__init__(
alphanum_only=True,
reserved_tokens=None
)
Constructs a Tokenizer.
Note that the Tokenizer is invertible if alphanum_only=False
.
i.e. s == t.join(t.tokenize(s))
.
Args:
alphanum_only
:bool
, ifTrue
, only parse out alphanumeric tokens (non-alphanumeric characters are dropped); otherwise, keep all characters (individual tokens will still be either all alphanumeric or all non-alphanumeric).reserved_tokens
:list<str>
, a list of strings that, if any are ins
, will be preserved as whole tokens, even if they contain mixed alphanumeric/non-alphanumeric characters.
Properties
alphanum_only
reserved_tokens
Methods
join
join(tokens)
Joins tokens into a string.
load_from_file
@classmethod
load_from_file(
cls,
filename_prefix
)
save_to_file
save_to_file(filename_prefix)
tokenize
tokenize(s)
Splits a string into tokens.