tf.compat.v1.keras.layers.experimental.preprocessing.StringLookup

Maps strings from a vocabulary to integer indices.

Inherits From: StringLookup

This layer translates a set of arbitrary strings into an integer output via a table-based lookup, with optional out-of-vocabulary handling.

If desired, the user can call this layer's adapt() method on a data set, which will analyze the data set, determine the frequency of individual string values, and create a vocabulary from them. This vocabulary can have unlimited size or be capped, depending on the configuration options for this layer; if there are more unique values in the input than the maximum vocabulary size, the most frequent terms will be used to create the vocabulary.

Examples:

Creating a lookup layer with a known vocabulary

This example creates a lookup layer with a pre-existing vocabulary.

vocab = ["a", "b", "c", "d"]
data = tf.constant([["a", "c", "d"], ["d", "z", "b"]])
layer = StringLookup(vocabulary=vocab)
layer(data)
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[2, 4, 5],
       [5, 1, 3]])>

Creating a lookup layer with an adapted vocabulary

This example creates a lookup layer and generates the vocabulary by analyzing the dataset.

data = tf.constant([["a", "c", "d"], ["d", "z", "b"]])
layer = StringLookup()
layer.adapt(data)
layer.get_vocabulary()
['', '[UNK]', 'd', 'z', 'c', 'b', 'a']

Note how the mask token '' and the OOV token [UNK] have been added to the vocabulary. The remaining tokens are sorted by frequency ('d', which has 2 occurrences, is first) then by inverse sort order.

data = tf.constant([["a", "c", "d"], ["d", "z", "b"]])
layer = StringLookup()
layer.adapt(data)
layer(data)
<tf.Tensor: shape=(2, 3),