Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


String to Id table that assigns out-of-vocabulary keys to hash buckets.

Inherits From: TrackableResource

Used in the notebooks

Used in the guide Used in the tutorials

For example, if an instance of StaticVocabularyTable is initialized with a string-to-id initializer that maps:

init = tf.lookup.KeyValueTensorInitializer(
    keys=tf.constant(['emerson', 'lake', 'palmer']),
    values=tf.constant([0, 1, 2], dtype=tf.int64))
table = tf.lookup.StaticVocabularyTable(

The Vocabulary object will performs the following mapping:

  • emerson -> 0
  • lake -> 1
  • palmer -> 2
  • <other term> -> bucket_id, where bucket_id will be between 3 and 3 + num_oov_buckets - 1 = 7, calculated by: hash(<term>) % num_oov_buckets + vocab_size

If input_tensor is:

input_tensor = tf.constant(["emerson", "lake", "palmer",
                            "king", "crimson"])
array([0, 1, 2, 6, 7])

If initializer is None, only out-of-vocabulary buckets are used.

Example usage:

num_oov_buckets = 3
vocab = ["emerson", "lake", "palmer", "crimnson"]
import tempfile
f = tempfile.NamedTemporaryFile(delete=False)
init = tf.lookup.TextFileInitializer(,
    key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
    value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticVocabularyTable(init, num_oov_buckets)
table.lookup(tf.constant(["palmer", "crimnson" , "king",
                          "tarkus", "black", "moon"])).numpy()
array([2, 3, 5, 6, 6, 4])

The hash function used for generating out-of-vocabulary buckets ID is Fingerprint64.

Note that the out-of-vocabulary bucket IDs always range from the table size up to size + num_oov_buckets - 1 regardless of the table values, which could cause unexpected collisions:

init = tf.lookup.KeyValueTensorInitializer(
    keys=tf.constant(["emerson", "lake", "palmer"]),
    values=tf.constant([1, 2, 3], dtype=tf.int64))
table = tf.lookup.StaticVocabularyTable(
input_tensor = tf.constant(["emerson", "lake", "palmer", "king"])
array([1, 2, 3, 3])

initializer A TableInitializerBase object that contains the data used to initialize the table. If None, then we only use out-of-vocab buckets.
num_oov_buckets Number of buckets to use for out-of-vocabulary keys. Must be greater than zero.
lookup_key_dtype Data type of keys passed to lookup. Defaults to initializer.key_dtype if initializer is specified, otherwise tf.string. Must be string or integer, and must be castable to initializer.key_dtype.
name A name for the operation (optional).