tf.RaggedTensor

TensorFlow 1 version View source on GitHub

Represents a ragged tensor.

A RaggedTensor is a tensor with one or more ragged dimensions, which are dimensions whose slices may have different lengths. For example, the inner (column) dimension of rt=[[3, 1, 4, 1], [], [5, 9, 2], [6], []] is ragged, since the column slices (rt[0, :], ..., rt[4, :]) have different lengths. Dimensions whose slices all have the same length are called uniform dimensions. The outermost dimension of a RaggedTensor is always uniform, since it consists of a single slice (and so there is no possibility for differing slice lengths).

The total number of dimensions in a RaggedTensor is called its rank, and the number of ragged dimensions in a RaggedTensor is called its ragged-rank. A RaggedTensor's ragged-rank is fixed at graph creation time: it can't depend on the runtime values of Tensors, and can't vary dynamically for different session runs.

Potentially Ragged Tensors

Many ops support both Tensors and RaggedTensors. The term "potentially ragged tensor" may be used to refer to a tensor that might be either a Tensor or a RaggedTensor. The ragged-rank of a Tensor is zero.

Documenting RaggedTensor Shapes

When documenting the shape of a RaggedTensor, ragged dimensions can be indicated by enclosing them in parentheses. For example, the shape of a 3-D RaggedTensor that stores the fixed-size word embedding for each word in a sentence, for each sentence in a batch, could be written as [num_sentences, (num_words), embedding_size]. The parentheses around (num_words) indicate that dimension is ragged, and that the length of each element list in that dimension may vary for each item.

Component Tensors

Internally, a RaggedTensor consists of a concatenated list of values that are partitioned into variable-length rows. In particular, each RaggedTensor consists of:

  • A values tensor, which concatenates the variable-length rows into a flattened list. For example, the values tensor for [[3, 1, 4, 1], [], [5, 9, 2], [6], []] is [3, 1, 4, 1, 5, 9, 2, 6].

  • A row_splits vector, which indicates how those flattened values are divided into rows. In particular, the values for row rt[i] are stored in the slice rt.values[rt.row_splits[i]:rt.row_splits[i+1]].

Example:

print(tf.RaggedTensor.from_row_splits(
      values=[3, 1, 4, 1, 5, 9, 2, 6],
      row_splits=[0, 4, 4, 7, 8, 8]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>

Alternative Row-Partitioning Schemes

In addition to row_splits, ragged tensors provide support for five other row-partitioning schemes:

  • row_lengths: a vector with shape [nrows], which specifies the length of each row.

  • value_rowids and nrows: value_rowids is a vector with shape [nvals], corresponding one-to-one with values, which specifies each value's row index. In particular, the row rt[row] consists of the values rt.values[j] where value_rowids[j]==row. nrows is an integer scalar that specifies the number of rows in the RaggedTensor. (nrows is used to indicate trailing empty rows.)

  • row_starts: a vector with shape [nrows], which specifies the start offset of each row. Equivalent to row_splits[:-1].

  • row_limits: a vector with shape [nrows], which specifies the stop offset of each row. Equivalent to row_splits[1:].

  • uniform_row_length: A scalar tensor, specifying the length of every row. This row-partitioning scheme may only be used if all rows have the same length.

Example: The following ragged tensors are equivalent, and all represent the nested list [[3, 1, 4, 1], [], [5, 9, 2], [6], []].

values = [3, 1, 4, 1, 5, 9, 2, 6]
rt1 = RaggedTensor.from_row_splits(values, row_splits=[0, 4, 4, 7, 8, 8])
rt2 = RaggedTensor.from_row_lengths(values, row_lengths=[4, 0, 3, 1, 0])
rt3 = RaggedTensor.from_value_rowids(
    values, value_rowids=[0, 0, 0, 0, 2, 2, 2, 3], nrows=5)
rt4 = RaggedTensor.from_row_starts(values, row_starts=[0, 4, 4, 7, 8])
rt5 = RaggedTensor.from_row_limits(values, row_limits=[4, 4, 7, 8, 8])

Multiple Ragged Dimensions

RaggedTensors with multiple ragged dimensions can be defined by using a nested RaggedTensor for the values tensor. Each nested RaggedTensor adds a single ragged dimension.

inner_rt = RaggedTensor.from_row_splits(  # =rt1 from above
    values=[3, 1, 4, 1, 5, 9, 2, 6], row_splits=[0, 4, 4, 7, 8, 8])
outer_rt = RaggedTensor.from_row_splits(
    values=inner_rt, row_splits=[0, 3, 3, 5])
print(outer_rt.to_list())
[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]
print(outer_rt.ragged_rank)
2

The factory function RaggedTensor.from_nested_row_splits may be used to construct a RaggedTensor with multiple ragged dimensions directly, by providing a list of row_splits tensors:

RaggedTensor.from_nested_row_splits(
    flat_values=[3, 1, 4, 1, 5, 9, 2, 6],
    nested_row_splits=([0, 3, 3, 5], [0, 4, 4, 7, 8, 8])).to_list()
[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]

Uniform Inner Dimensions

RaggedTensors with uniform inner dimensions can be defined by using a multidimensional Tensor for values.

rt = RaggedTensor.from_row_splits(values=tf.ones([5, 3], tf.int32),
                                  row_splits=[0, 2, 5])
print(rt.to_list())
[[[1, 1, 1], [1, 1, 1]],
 [[1, 1, 1], [1, 1, 1], [1, 1, 1]]]
print(rt.shape)
(2, None, 3)

Uniform Outer Dimensions

RaggedTensors with uniform outer dimensions can be defined by using one or more RaggedTensor with a uniform_row_length row-partitioning tensor. For example, a RaggedTensor with shape [2, 2, None] can be constructed with this method from a RaggedTensor values with shape [4, None]:

values = tf.ragged.constant([[1, 2, 3], [4], [5, 6], [7, 8, 9, 10]])
print(values.shape)
(4, None)
rt6 = tf.RaggedTensor.from_uniform_row_length(values, 2)
print(rt6)
<tf.RaggedTensor [[[1, 2, 3], [4]], [[5, 6], [7, 8, 9, 10]]]>
print(rt6.shape)
(2, 2, None)

Note that rt6 only contains one ragged dimension (the innermost dimension). In contrast, if from_row_splits is used to construct a similar RaggedTensor, then that RaggedTensor will have two ragged dimensions:

rt7 = tf.RaggedTensor.from_row_splits(values, [0, 2, 4])
print(rt7.shape)
(2, None, None)

Uniform and ragged outer dimensions may be interleaved, meaning that a tensor with any combination of ragged and uniform dimensions may be created. For example, a RaggedTensor t4 with shape [3, None, 4, 8, None, 2] could be constructed as follows:

t0 = tf.zeros([1000, 2])                           # Shape:         [1000, 2]
t1 = RaggedTensor.from_row_lengths(t0, [...])      #           [160, None, 2]
t2 = RaggedTensor.from_uniform_row_length(t1, 8)   #         [20, 8, None, 2]
t3 = RaggedTensor.from_uniform_row_length(t2, 4)   #       [5, 4, 8, None, 2]
t4 = RaggedTensor.from_row_lengths(t3, [...])      # [3, None, 4, 8, None, 2]

values A potentially ragged tensor of any dtype and shape [nvals, ...].
row_partition A RowPartition object, representing the arrangement of the lists at the top level.
internal True if the constructor is being called by one of the factory methods. If false, an exception will be raised.

ValueError If internal = False. Note that this method is intended only for internal use.
TypeError If values is not a RaggedTensor or Tensor, or row_partition is not a RowPartition.

dtype The DType of values in this tensor.
flat_values The innermost values tensor for this ragged tensor.

Concretely, if rt.values is a Tensor, then rt.flat_values is rt.values; otherwise, rt.flat_values is rt.values.flat_values.

Conceptually, flat_values is the tensor formed by flattening the outermost dimension and all of the ragged dimensions into a single dimension.

rt.flat_values.shape = [nvals] + rt.shape[rt.ragged_rank + 1:] (where nvals is the number of items in the flattened dimensions).

Example:

rt = tf.ragged.constant([[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]])
print(rt.flat_values)
tf.Tensor([3 1 4 1 5 9 2 6], shape=(8,), dtype=int32)

nested_row_splits A tuple containing the row_splits for all ragged dimensions.

rt.nested_row_splits is a tuple containing the row_splits tensors for all ragged dimensions in rt, ordered from outermost to innermost. In particular, rt.nested_row_splits = (rt.row_splits,) + value_splits where:

  • value_splits = () if rt.values is a Tensor.
  • value_splits = rt.values.nested_row_splits otherwise.

Example:

rt = tf.ragged.constant(
    [[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]])
for i, splits in enumerate(rt.nested_row_splits):
  print('Splits for dimension %d: %s' % (i+1, splits.numpy()))
Splits for dimension 1: [0 3]
Splits for dimension 2: [0 3 3 5]
Splits for dimension 3: [0 4 4 7 8 8]

ragged_rank The number of ragged dimensions in this ragged tensor.
row_splits The row-split indices for this ragged tensor's values.

rt.row_splits specifies where the values for each row begin and end in rt.values. In particular, the values for row rt[i] are stored in the slice rt.values[rt.row_splits[i]:rt.row_splits[i+1]].

Example:

rt = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
print(rt.row_splits)  # indices of row splits in rt.values
tf.Tensor([0 4 4 7 8 8], shape=(6,), dtype=int64)

shape The statically known shape of this ragged tensor.

tf.ragged.constant([[0], [1, 2]]).shape
TensorShape([2, None])
tf.ragged.constant([[[0, 1]], [[1, 2], [3, 4]]], ragged_rank=1).shape
TensorShape([2, None, 2])

uniform_row_length The length of each row in this ragged tensor, or None if rows are ragged.

rt1 = tf.ragged.constant([[1, 2, 3], [4], [5, 6], [7, 8, 9, 10]])
print(rt1.uniform_row_length)  # rows are ragged.
None
rt2 = tf.RaggedTensor.from_uniform_row_length(
    values=rt1, uniform_row_length=2)
print(rt2)
<tf.RaggedTensor [[[1, 2, 3], [4]], [[5, 6], [7, 8, 9, 10]]]>
print(rt2.uniform_row_length)  # rows are not ragged (all have size 2).
tf.Tensor(2, shape=(), dtype=int64)

A RaggedTensor's rows are only considered to be uniform (i.e. non-ragged) if it can be determined statically (at graph construction time) that the rows all have the same length.

values