tf.keras.preprocessing.sequence.pad_sequences

TensorFlow 1 version View source on GitHub

Pads sequences to the same length.

tf.keras.preprocessing.sequence.pad_sequences(
    sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre',
    value=0.0
)

Used in the notebooks

Used in the guide Used in the tutorials

This function transforms a list (of length num_samples) of sequences (lists of integers) into a 2D Numpy array of shape (num_samples, num_timesteps). num_timesteps is either the maxlen argument if provided, or the length of the longest sequence in the list.

Sequences that are shorter than num_timesteps are padded with value until they are num_timesteps long.

Sequences longer than num_timesteps are truncated so that they fit the desired length.

The position where padding or truncation happens is determined by the arguments padding and truncating, respectively. Pre-padding or removing values from the beginning of the sequence is the default.

sequence = [[1], [2, 3], [4, 5, 6]] 
tf.keras.preprocessing.sequence.pad_sequences(sequence) 
array([[0, 0, 1], 
       [0, 2, 3], 
       [4, 5, 6]], dtype=int32) 
tf.keras.preprocessing.sequence.pad_sequences(sequence, value=-1) 
array([[-1, -1,  1], 
       [-1,  2,  3], 
       [ 4,  5,  6]], dtype=int32) 
tf.keras.preprocessing.sequence.pad_sequences(sequence, padding='post') 
array([[1, 0, 0], 
       [2, 3, 0], 
       [4, 5, 6]], dtype=int32) 
tf.keras.preprocessing.sequence.pad_sequences(sequence, maxlen=2) 
array([[0, 1], 
       [2, 3], 
       [5, 6]], dtype=int32) 

Arguments:

  • sequences: List of sequences (each sequence is a list of integers).
  • maxlen: Optional Int, maximum length of all sequences. If not provided, sequences will be padded to the length of the longest individual sequence.
  • dtype: (Optional, defaults to int32). Type of the output sequences. To pad sequences with variable length strings, you can use object.
  • padding: String, 'pre' or 'post' (optional, defaults to 'pre'): pad either before or after each sequence.
  • truncating: String, 'pre' or 'post' (optional, defaults to 'pre'): remove values from sequences larger than maxlen, either at the beginning or at the end of the sequences.
  • value: Float or String, padding value. (Optional, defaults to 0.)

Returns:

Numpy array with shape (len(sequences), maxlen)

Raises:

  • ValueError: In case of invalid values for truncating or padding, or in case of invalid shape for a sequences entry.