TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tft.apply_vocabulary

tft.apply_vocabulary(
    x,
    deferred_vocab_filename_tensor,
    default_value=-1,
    num_oov_buckets=0,
    lookup_fn=None,
    name=None
)

Maps x to a vocabulary specified by the deferred tensor.

This function also writes domain statistics about the vocabulary min and max values. Note that the min and max are inclusive, and depend on the vocab size, num_oov_buckets and default_value.

In case one of the tokens contains the '\n' or '\r' characters or is empty it will be discarded since we are currently writing the vocabularies as text files. This behavior will likely be fixed/improved in the future.

Args:

  • x: A categorical Tensor or SparseTensor of type tf.string or tf.int[8|16|32|64] to which the vocabulary transformation should be applied. The column names are those intended for the transformed tensors.
  • deferred_vocab_filename_tensor: The deferred vocab filename tensor as returned by tft.vocabulary, as long as the frequencies were not stored.
  • default_value: The value to use for out-of-vocabulary values, unless 'num_oov_buckets' is greater than zero.
  • num_oov_buckets: Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if num_oov_buckets is greater than zero. Otherwise it is assigned the default_value.
  • lookup_fn: Optional lookup function, if specified it should take a tensor and a deferred vocab filename as an input and return a lookup op along with the table size, by default apply_vocab constructs a StaticHashTable for the table lookup.
  • name: (Optional) A name for this operation.

Returns:

A Tensor or SparseTensor where each string value is mapped to an integer. Each unique string value that appears in the vocabulary is mapped to a different integer and integers are consecutive starting from zero, and string value not in the vocabulary is assigned default_value.