tf.keras.utils.pad_sequences

Pads sequences to the same length.

Used in the notebooks

Used in the guide Used in the tutorials

This function transforms a list (of length num_samples) of sequences (lists of integers) into a 2D Numpy array of shape (num_samples, num_timesteps). num_timesteps is either the maxlen argument if provided, or the length of the longest sequence in the list.

Sequences that are shorter than num_timesteps are padded with value until they are num_timesteps long.

Sequences longer than num_timesteps are truncated so that they fit the desired length.

The position where padding or truncation happens is determined by the arguments padding and truncating, respectively. Pre-padding or removing values from the beginning of the sequence is the default.

sequence = [[1], [2, 3], [4, 5, 6]]
tf.keras.utils.pad_sequences(sequence)
array([[0, 0, 1],
       [0, 2, 3],
       [4, 5, 6]], dtype=int32)
tf.keras.utils.pad_sequences(sequence, value=-1)
array([[-1, -1,  1],
       [-1,  2,  3],
       [ 4,  5,  6]], dtype=int32)
tf.keras.utils.pad_sequences(sequence, padding='post')
array([[1, 0, 0],
       [2, 3, 0],
       [4, 5, 6]], dtype=int32)
tf.keras.utils.pad_sequences(sequence, maxlen=2)
array([[0, 1],
       [2, 3],
       [5, 6]], dtype=int32)

sequences List of sequences (each sequence is a list of integers).
maxlen Optional Int, maximum length of all sequences. If not provided, sequences will be padded to the length of the longest individual sequence.
dtype (Optional). Type of the output sequences. To pad sequences with variable length strings, you can use object. Defaults to "int32".
padding String, "pre" or "post" (optional): pad either before or after each sequence. Defaults to "pre".
truncating String, "pre" or "post" (optional): remove values from sequences larger than maxlen, either at the beginning or at the end of the sequences. Defaults to "pre".
value Float or String, padding value. (Optional). Defaults to 0..

Numpy array with shape (len(sequences), maxlen)

ValueError In case of invalid values for truncating or padding, or in case of invalid shape for a sequences entry.