text.pad_model_inputs

Pad model input and generate corresponding input masks.

text.pad_model_inputs(
    input, max_seq_length, pad_value=0
)

Used in the notebooks

Used in the guide
BERT Preprocessing with TF Text

pad_model_inputs performs the final packaging of a model's inputs commonly found in text models. This includes padding out (or simply truncating) to a fixed-size, max 2-dimensional Tensor and generating mask Tensors (of the same shape) with values of 0 if the corresponding item is a pad value and 1 if it is part of the original input.

Note that a simple truncation strategy (drop everything after max sequence length) is used to force the inputs to the specified shape. This may be incorrect and users should instead apply a Trimmer upstream to safely truncate large inputs.

input_data = tf.ragged.constant([
           [101, 1, 2, 102, 10, 20, 102],
           [101, 3, 4, 102, 30, 40, 50, 60, 70, 80],
           [101, 5, 6, 7, 8, 9, 102, 70],
       ], np.int32)
data, mask = pad_model_inputs(input=input_data, max_seq_length=9)
print("data: %s, mask: %s" % (data, mask))
  data: tf.Tensor(
  [[101   1   2 102  10  20 102   0   0]
   [101   3   4 102  30  40  50  60  70]
   [101   5   6   7   8   9 102  70   0]], shape=(3, 9), dtype=int32),
  mask: tf.Tensor(
  [[1 1 1 1 1 1 1 0 0]
   [1 1 1 1 1 1 1 1 1]
   [1 1 1 1 1 1 1 1 0]], shape=(3, 9), dtype=int32)

Args
`input`	A `RaggedTensor` or `Tensor` with rank >= 1.
`max_seq_length`	An int, or scalar `Tensor`. The "input" `Tensor` will be flattened down to 2 dimensions (if needed), and then have its inner dimension either padded out or truncated to this size.
`pad_value`	An int or scalar `Tensor` specifying the value used for padding.

Returns
A tuple of (padded_input, pad_mask) where:
`padded_input`	A `Tensor` corresponding to `inputs` that has been padded/truncated out to a fixed size and flattened to max 2 dimensions.
`pad_mask`	A `Tensor` corresponding to `padded_input` whose values are 0 if the corresponding item is a pad value and 1 if it is not.

text.pad_model_inputs

Used in the notebooks

Args

Returns