Cross Layer in Deep & Cross Network to learn explicit feature interactions.

Used in the notebooks

Used in the tutorials

A layer that creates explicit and bounded-degree feature interactions efficiently. The call method accepts inputs as a tuple of size 2 tensors. The first input x0 is the base layer that contains the original features (usually the embedding layer); the second input xi is the output of the previous Cross layer in the stack, i.e., the i-th Cross layer. For the first Cross layer in the stack, x0 = xi.

The output is x_{i+1} = x0 .* (W * xi + bias + diag_scale * xi) + xi, where .* designates elementwise multiplication, W could be a full-rank matrix, or a low-rank matrix U*V to reduce the computational cost, and diag_scale increases the diagonal of W to improve training stability ( especially for the low-rank case).


  1. R. Wang et al. See Eq. (1) for full-rank and Eq. (2) for low-rank version.
  2. R. Wang et al.


# after embedding layer in a functional model:
input = tf.keras.Input(shape=(None,), name='index', dtype=tf.int64)
x0 = tf.keras.layers.Embedding(input_dim=32, output_dim=6)
x1 = Cross()(x0, x0)
x2 = Cross()(x0, x1)
logits = tf.keras.layers.Dense(units=10)(x2)
model = tf.keras.Model(input, logits)

projection_dim project dimension to reduce the computational cost. Default is None such that a full (input_dim by input_dim) matrix W is used. If enabled, a low-rank matrix W = U*V will be used, where U is of size input_dim by projection_dim and V is of size projection_dim by input_dim. projection_dim need to be smaller than input_dim/2 to improve the model efficiency. In practice, we've observed that projection_dim = d/4 consistently preserved the accuracy of a full-rank version.
diag_scale a non-negative float used to increase the diagonal of the kernel W by diag_scale, that is, W + diag_scale * I, where I is an identity matrix.
use_bias whether to add a bias term for this layer. If set to False, no bias term will be used.
kernel_initializer Initializer to use on the kernel matrix.
bias_initializer Initializer to use on the bias vector.
kernel_regularizer Regularizer to use on the kernel matrix.
bias_regularizer Regularizer to use on bias vector.

Input shape: A tuple of 2 (batch_size, input_dim) dimensional inputs. Output shape: A single (batch_size, input_dim) dimensional output.



View source

Computes the feature cross.

x0 The input tensor
x Optional second input tensor. If provided, the layer will compute crosses between x0 and x; if not provided, the layer will compute crosses between x0 and itself.

Tensor of crosses.