tft.apply_vocabulary

Maps x to a vocabulary specified by the deferred tensor.

tft.apply_vocabulary(
    x: common_types.ConsistentTensorType,
    deferred_vocab_filename_tensor: common_types.TemporaryAnalyzerOutputType,
    *,
    default_value: Any = -1,
    num_oov_buckets: int = 0,
    lookup_fn: Optional[Callable[[common_types.TensorType, tf.Tensor], Tuple[tf.Tensor, tf
        .Tensor]]] = None,
    file_format: common_types.VocabularyFileFormatType = analyzers.DEFAULT_VOCABULARY_FILE_FORMAT,
    name: Optional[str] = None
) -> common_types.ConsistentTensorType

This function also writes domain statistics about the vocabulary min and max values. Note that the min and max are inclusive, and depend on the vocab size, num_oov_buckets and default_value.

Args
`x`	A categorical `Tensor`, `SparseTensor`, or `RaggedTensor` of type tf.string or tf.int[8\|16\|32\|64] to which the vocabulary transformation should be applied. The column names are those intended for the transformed tensors.
`deferred_vocab_filename_tensor`	The deferred vocab filename tensor as returned by `tft.vocabulary`, as long as the frequencies were not stored.
`default_value`	The value to use for out-of-vocabulary values, unless 'num_oov_buckets' is greater than zero.
`num_oov_buckets`	Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if `num_oov_buckets` is greater than zero. Otherwise it is assigned the `default_value`.
`lookup_fn`	Optional lookup function, if specified it should take a tensor and a deferred vocab filename as an input and return a lookup `op` along with the table size, by default `apply_vocabulary` constructs a StaticHashTable for the table lookup.
`file_format`	(Optional) A str. The format of the given vocabulary. Accepted formats are: 'tfrecord_gzip', 'text'. The default value is 'text'.
`name`	(Optional) A name for this operation.

Returns
A `Tensor`, `SparseTensor`, or `RaggedTensor` where each string value is mapped to an integer. Each unique string value that appears in the vocabulary is mapped to a different integer and integers are consecutive starting from zero, and string value not in the vocabulary is assigned default_value.

tft.apply_vocabulary Stay organized with collections Save and categorize content based on your preferences.

Args

Returns

tft.apply_vocabulary