tft.apply_vocabulary( x, deferred_vocab_filename_tensor, default_value=-1, num_oov_buckets=0, lookup_fn=None, name=None )
x to a vocabulary specified by the deferred tensor.
This function also writes domain statistics about the vocabulary min and max values. Note that the min and max are inclusive, and depend on the vocab size, num_oov_buckets and default_value.
In case one of the tokens contains the '\n' or '\r' characters or is empty it will be discarded since we are currently writing the vocabularies as text files. This behavior will likely be fixed/improved in the future.
x: A categorical
SparseTensorof type tf.string or tf.int[8|16|32|64] to which the vocabulary transformation should be applied. The column names are those intended for the transformed tensors.
deferred_vocab_filename_tensor: The deferred vocab filename tensor as returned by
tft.vocabulary, as long as the frequencies were not stored.
default_value: The value to use for out-of-vocabulary values, unless 'num_oov_buckets' is greater than zero.
num_oov_buckets: Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if
num_oov_bucketsis greater than zero. Otherwise it is assigned the
lookup_fn: Optional lookup function, if specified it should take a tensor and a deferred vocab filename as an input and return a lookup
opalong with the table size, by default
apply_vocabconstructs a StaticHashTable for the table lookup.
name: (Optional) A name for this operation.
SparseTensor where each string value is mapped to an
integer. Each unique string value that appears in the vocabulary
is mapped to a different integer and integers are consecutive
starting from zero, and string value not in the vocabulary is