View source on GitHub

Maps x to a vocabulary specified by the deferred tensor.

This function also writes domain statistics about the vocabulary min and max values. Note that the min and max are inclusive, and depend on the vocab size, num_oov_buckets and default_value.

In case one of the tokens contains the '\n' or '\r' characters or is empty it will be discarded since we are currently writing the vocabularies as text files. This behavior will likely be fixed/improved in the future.

x A categorical Tensor or SparseTensor of type tf.string or[8|16|32|64] to which the vocabulary transformation should be applied. The column names are those intended for the transformed tensors.
deferred_vocab_filename_tensor The deferred vocab filename tensor as returned by tft.vocabulary, as long as the frequencies were not stored.
default_value The value to use for out-of-vocabulary values, unless 'num_oov_buckets' is greater than zero.
num_oov_buckets Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if num_oov_buckets is greater than zero. Otherwise it is assigned the default_value.
lookup_fn Optional lookup function, if specified it should take a tensor and a deferred vocab filename as an input and return a lookup op along with the table size, by default apply_vocab constructs a StaticHashTable for the table lookup.
name (Optional) A name for this operation.

A Tensor or SparseTensor where each string value is mapped to an integer. Each unique string value that appears in the vocabulary is mapped to a different integer and integers are consecutive starting from zero, and string value not in the vocabulary is assigned default_value.