tfds.deprecated.text.TokenTextEncoder
Stay organized with collections
Save and categorize content based on your preferences.
TextEncoder backed by a list of tokens.
Inherits From: TextEncoder
tfds.deprecated.text.TokenTextEncoder(
vocab_list,
oov_buckets=1,
oov_token='UNK',
lowercase=False,
tokenizer=None,
strip_vocab=True,
decode_token_separator=' '
)
Tokenization splits on (and drops) non-alphanumeric characters with
regex "\W+".
Args |
vocab_list
|
list<str> , list of tokens.
|
oov_buckets
|
int , the number of int s to reserve for OOV hash buckets.
Tokens that are OOV will be hash-modded into a OOV bucket in encode .
|
oov_token
|
str , the string to use for OOV ids in decode .
|
lowercase
|
bool , whether to make all text and tokens lowercase.
|
tokenizer
|
Tokenizer , responsible for converting incoming text into a
list of tokens.
|
strip_vocab
|
bool , whether to strip whitespace from the beginning and
end of elements of vocab_list .
|
decode_token_separator
|
str , the string used to separate tokens when
decoding.
|
Attributes |
lowercase
|
|
oov_token
|
|
tokenizer
|
|
tokens
|
|
vocab_size
|
Size of the vocabulary. Decode produces ints [1, vocab_size).
|
Methods
decode
View source
decode(
ids
)
Decodes a list of integers into text.
encode
View source
encode(
s
)
Encodes text into a list of integers.
load_from_file
View source
@classmethod
load_from_file(
filename_prefix
)
Load from file. Inverse of save_to_file.
save_to_file
View source
save_to_file(
filename_prefix
)
Store to file. Inverse of load_from_file.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[{
"type": "thumb-down",
"id": "missingTheInformationINeed",
"label":"Missing the information I need"
},{
"type": "thumb-down",
"id": "tooComplicatedTooManySteps",
"label":"Too complicated / too many steps"
},{
"type": "thumb-down",
"id": "outOfDate",
"label":"Out of date"
},{
"type": "thumb-down",
"id": "samplesCodeIssue",
"label":"Samples / code issue"
},{
"type": "thumb-down",
"id": "otherDown",
"label":"Other"
}]
[{
"type": "thumb-up",
"id": "easyToUnderstand",
"label":"Easy to understand"
},{
"type": "thumb-up",
"id": "solvedMyProblem",
"label":"Solved my problem"
},{
"type": "thumb-up",
"id": "otherUp",
"label":"Other"
}]
{"lastModified": "Last updated 2024-04-26 UTC."}
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]