Save the date! Google I/O returns May 18-20 Register now


Category encoding layer.

Inherits From: PreprocessingLayer, Layer, Module

Used in the notebooks

Used in the guide Used in the tutorials

This layer provides options for condensing data into a categorical encoding. It accepts integer values as inputs and outputs a dense representation (one sample = 1-index tensor of float values representing data about the sample's tokens) of those inputs.


layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
          max_tokens=4, output_mode="count")
layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
  array([[1., 1., 0., 0.],
         [2., 0., 0., 0.],
         [0., 1., 1., 0.],
         [0., 1., 0., 1.]], dtype=float32)>

Examples with weighted inputs:

layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
          max_tokens=4, output_mode="count")
count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])
layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)
<tf.Tensor: shape=(4, 4), dtype=float64, numpy=
  array([[0.1, 0.2, 0. , 0. ],
         [0.2, 0. , 0. , 0. ],
         [0. , 0.2, 0.3, 0. ],
         [0. , 0.2, 0. , 0.4]])>

Call arguments:

  • inputs: A 2D tensor (samples, timesteps).
  • count_weights: A 2D tensor in the same shape as inputs indicating the weight for each sample value when summing up in count mode. Not used in binary or tfidf mode.

max_tokens The maximum size of the vocabulary for this layer. If None, there is no cap on the size of the vocabulary.
output_mode Specification for the output of the layer. Defaults to "binary". Values can be "binary", "count" or "tf-idf", configuring the layer as follows: "binary": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item. "count": As "binary", but the int array contains a count of the number of times the token at that index appeared in the batch item. "tf-idf": As "binary", but the TF-IDF algorithm is applied to find the value in each token slot.
sparse Boolean. If true, returns a SparseTensor instead of a dense Tensor. Defaults to False.



View source

Fits the state of the preprocessing layer to the dataset.

Overrides the default adapt method to apply relevant preprocessing to the inputs before passing to the combiner.

data The data to train on. It can be passed either as a Dataset, or as a numpy array.
reset_state Optional argument specifying whether to clear the state of the layer at the start of the call to adapt. This must be True for this layer, which does not support repeated calls to adapt.

RuntimeError if the layer cannot be adapted at this time.


View source


View source