View source on GitHub

Category encoding layer.

This layer provides options for condensing data into a categorical encoding. It accepts integer values as inputs and outputs a dense representation (one sample = 1-index tensor of float values representing data about the sample's tokens) of those inputs.


layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=int64, numpy=
  array([[1, 1, 0, 0],
         [2, 0, 0, 0],
         [0, 1, 1, 0],
         [0, 1, 0, 1]])>

max_tokens The maximum size of the vocabulary for this layer. If None, there is no cap on the size of the vocabulary.
output_mode Optional specification for the output of the layer. Values can be "binary", "count" or "tf-idf", configuring the layer as follows: "binary": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item. "count": As "binary", but the int array contains a count of the number of times the token at that index appeared in the batch item. "tf-idf": As "binary", but the TF-IDF algorithm is applied to find the value in each token slot.
sparse Boolean. If true, returns a SparseTensor instead of a dense Tensor. Defaults to False.



View source

Fits the state of the preprocessing layer to the dataset.

Overrides the default adapt method to apply relevant preprocessing to the inputs before passing to the combiner.

data The data to train on. It can be passed either as a Dataset, or as a numpy array.
reset_state Optional argument specifying whether to clear the state of the layer at the start of the call to adapt. This must be True for this layer, which does not support repeated calls to adapt.

RuntimeError if the layer cannot be adapted at this time.


View source


View source