tf.RaggedTensor

TensorFlow 2 version View source on GitHub

Represents a ragged tensor.

A RaggedTensor is a tensor with one or more ragged dimensions, which are dimensions whose slices may have different lengths. For example, the inner (column) dimension of rt=[[3, 1, 4, 1], [], [5, 9, 2], [6], []] is ragged, since the column slices (rt[0, :], ..., rt[4, :]) have different lengths. Dimensions whose slices all have the same length are called uniform dimensions. The outermost dimension of a RaggedTensor is always uniform, since it consists of a single slice (and so there is no possibility for differing slice lengths).

The total number of dimensions in a RaggedTensor is called its rank, and the number of ragged dimensions in a RaggedTensor is called its ragged-rank. A RaggedTensor's ragged-rank is fixed at graph creation time: it can't depend on the runtime values of Tensors, and can't vary dynamically for different session runs.

Potentially Ragged Tensors

Many ops support both Tensors and RaggedTensors. The term "potentially ragged tensor" may be used to refer to a tensor that might be either a Tensor or a RaggedTensor. The ragged-rank of a Tensor is zero.

Documenting RaggedTensor Shapes

When documenting the shape of a RaggedTensor, ragged dimensions can be indicated by enclosing them in parentheses. For example, the shape of a 3-D RaggedTensor that stores the fixed-size word embedding for each word in a sentence, for each sentence in a batch, could be written as [num_sentences, (num_words), embedding_size]. The parentheses around (num_words) indicate that dimension is ragged, and that the length of each element list in that dimension may vary for each item.

Component Tensors

Internally, a RaggedTensor consists of a concatenated list of values that are partitioned into variable-length rows. In particular, each RaggedTensor consists of:

  • A values tensor, which concatenates the variable-length rows into a flattened list. For example, the values tensor for [[3, 1, 4, 1], [], [5, 9, 2], [6], []] is [3, 1, 4, 1, 5, 9, 2, 6].

  • A row_splits vector, which indicates how those flattened values are divided into rows. In particular, the values for row rt[i] are stored in the slice rt.values[rt.row_splits[i]:rt.row_splits[i+1]].

Example:

print(tf.RaggedTensor.from_row_splits(
    values=[3, 1, 4, 1, 5, 9, 2, 6],
    row_splits=[0, 4, 4, 7, 8, 8]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>

Alternative Row-Partitioning Schemes

In addition to row_splits, ragged tensors provide support for four other row-partitioning schemes:

  • row_lengths: a vector with shape [nrows], which specifies the length of each row.

  • value_rowids and nrows: value_rowids is a vector with shape [nvals], corresponding one-to-one with values, which specifies each value's row index. In particular, the row rt[row] consists of the values rt.values[j] where value_rowids[j]==row. nrows is an integer scalar that specifies the number of rows in the RaggedTensor. (nrows is used to indicate trailing empty rows.)

  • row_starts: a vector with shape [nrows], which specifies the start offset of each row. Equivalent to row_splits[:-1].

  • row_limits: a vector with shape [nrows], which specifies the stop offset of each row. Equivalent to row_splits[1:].

Example: The following ragged tensors are equivalent, and all represent the nested list [[3, 1, 4, 1], [], [5, 9, 2], [6], []].

values = [3, 1, 4, 1, 5, 9, 2, 6]
rt1 = RaggedTensor.from_row_splits(values, row_splits=[0, 4, 4, 7, 8, 8])
rt2 = RaggedTensor.from_row_lengths(values, row_lengths=[4, 0, 3, 1, 0])
rt3 = RaggedTensor.from_value_rowids(
    values, value_rowids=[0, 0, 0, 0, 2, 2, 2, 3], nrows=5)
rt4 = RaggedTensor.from_row_starts(values, row_starts=[0, 4, 4, 7, 8])
rt5 = RaggedTensor.from_row_limits(values, row_limits=[4, 4, 7, 8, 8])

Multiple Ragged Dimensions

RaggedTensors with multiple ragged dimensions can be defined by using a nested RaggedTensor for the values tensor. Each nested RaggedTensor adds a single ragged dimension.

inner_rt = RaggedTensor.from_row_splits(  # =rt1 from above
    values=[3, 1, 4, 1, 5, 9, 2, 6], row_splits=[0, 4, 4, 7, 8, 8])
outer_rt = RaggedTensor.from_row_splits(
    values=inner_rt, row_splits=[0, 3, 3, 5])
print outer_rt.to_list()
[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]
print outer_rt.ragged_rank
2

The factory function RaggedTensor.from_nested_row_splits may be used to construct a RaggedTensor with multiple ragged dimensions directly, by providing a list of row_splits tensors:

RaggedTensor.from_nested_row_splits(
    flat_values=[3, 1, 4, 1, 5, 9, 2, 6],
    nested_row_splits=([0, 3, 3, 5], [0, 4, 4, 7, 8, 8])).to_list()
[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]

Uniform Inner Dimensions

RaggedTensors with uniform inner dimensions can be defined by using a multidimensional Tensor for values.

rt = RaggedTensor.from_row_splits(values=tf.ones([5, 3]),
..                                    row_splits=[0, 2, 5])
print rt.to_list()
[[[1, 1, 1], [1, 1, 1]],
 [[1, 1, 1], [1, 1, 1], [1, 1, 1]]]
print rt.shape
 (2, ?, 3)

RaggedTensor Shape Restrictions

The shape of a RaggedTensor is currently restricted to have the following form:

  • A single uniform dimension
  • Followed by one or more ragged dimensions
  • Followed by zero or more uniform dimensions.

This restriction follows from the fact that each nested RaggedTensor replaces the uniform outermost dimension of its values with a uniform dimension followed by a ragged dimension.

values A potentially ragged tensor of any dtype and shape [nvals, ...].
row_splits A 1-D integer tensor with shape [nrows+1].
cached_row_lengths A 1-D integer tensor with shape [nrows]
cached_value_rowids A 1-D integer tensor with shape [nvals].
cached_nrows A 1-D integer scalar tensor.
internal True if the constructor is being called by one of the factory methods. If false, an exception will be raised.

TypeError If a row partitioning tensor has an inappropriate dtype.
TypeError If exactly one row partitioning argument was not specified.
ValueError If a row partitioning tensor has an inappropriate shape.
ValueError If multiple partitioning arguments are specified.
ValueError If nrows is specified but value_rowids is not None.

dtype The DType of values in this tensor.
flat_values The innermost values tensor for this ragged tensor.

Concretely, if rt.values is a Tensor, then rt.flat_values is rt.values; otherwise, rt.flat_values is rt.values.flat_values.

Conceptually, flat_values is the tensor formed by flattening the outermost dimension and all of the ragged dimensions into a single dimension.

rt.flat_values.shape = [nvals] + rt.shape[rt.ragged_rank + 1:] (where nvals is the number of items in the flattened dimensions).

Example:

rt = ragged.constant([[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]])
print rt.flat_values()
tf.Tensor([3, 1, 4, 1, 5, 9, 2, 6])

nested_row_splits A tuple containing the row_splits for all ragged dimensions.

rt.nested_row_splits is a tuple containing the row_splits tensors for all ragged dimensions in rt, ordered from outermost to innermost. In particular, rt.nested_row_splits = (rt.row_splits,) + value_splits where:

  • value_splits = () if rt.values is a Tensor.
  • value_splits = rt.values.nested_row_splits otherwise.

Example:

rt = ragged.constant([[[[3, 1, 4, 1], [], [5, 9, 2]], [], [[6], []]]])
for i, splits in enumerate(rt.nested_row_splits()):
  print('Splits for dimension %d: %s' % (i+1, splits))
Splits for dimension 1: [0, 1]
Splits for dimension 2: [0, 3, 3, 5]
Splits for dimension 3: [0, 4, 4, 7, 8, 8]

ragged_rank The number of ragged dimensions in this ragged tensor.
row_splits The row-split indices for this ragged tensor's values.

rt.row_splits specifies where the values for each row begin and end in rt.values. In particular, the values for row rt[i] are stored in the slice rt.values[rt.row_splits[i]:rt.row_splits[i+1]].

Example:

>>> rt = ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
>>> print rt.row_splits  # indices of row splits in rt.values
tf.Tensor([0, 4, 4, 7, 8, 8])

shape The statically known shape of this ragged tensor.
values The concatenated rows for this ragged tensor.

rt.values is a potentially ragged tensor formed by flattening the two outermost dimensions of rt into a single dimension.

rt.values.shape = [nvals] + rt.shape[2:] (where nvals is the number of items in the outer two dimensions of rt).

rt.ragged_rank = self.ragged_rank - 1

Example:

>>> rt = ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
>>> print rt.values
tf.Tensor([3, 1, 4, 1, 5, 9, 2, 6])

Methods

bounding_shape

View source

Returns the tight bounding box shape for this RaggedTensor.

Args
axis An integer scalar or vector indicating which axes to return the bounding box for. If not specified, then the full bounding box is returned.
name A name prefix for the returned tensor (optional).
out_type dtype for the returned tensor. Defaults to self.row_splits.dtype.

Returns
An integer Tensor (dtype=self.row_splits.dtype). If axis is not specified, then output is a vector with output.shape=[self.shape.ndims]. If axis is a scalar, then the output is a scalar. If axis is a vector, then output is a vector, where output[i] is the bounding size for dimension axis[i].

Example:

>>> rt = ragged.constant([[1, 2, 3, 4], [5], [], [6, 7, 8, 9], [10]])
>>> rt.bounding_shape()
[5, 4]

consumers

View source

from_nested_row_lengths

View source

Creates a RaggedTensor from a nested list of row_lengths tensors.

Equivalent to:

result = flat_values
for row_lengths in reversed(nested_row_lengths):
  result = from_row_lengths(result, row_lengths)

Args
flat_values A potentially ragged tensor.
nested_row_lengths A list of 1-D integer tensors. The ith tensor is used as the row_lengths for the ith ragged dimension.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor (or flat_values if nested_row_lengths is empty).

from_nested_row_splits

View source

Creates a RaggedTensor from a nested list of row_splits tensors.

Equivalent to:

result = flat_values
for row_splits in reversed(nested_row_splits):
  result = from_row_splits(result, row_splits)

Args
flat_values A potentially ragged tensor.
nested_row_splits A list of 1-D integer tensors. The ith tensor is used as the row_splits for the ith ragged dimension.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor (or flat_values if nested_row_splits is empty).

from_nested_value_rowids

View source

Creates a RaggedTensor from a nested list of value_rowids tensors.

Equivalent to:

result = flat_values
for (rowids, nrows) in reversed(zip(nested_value_rowids, nested_nrows)):
  result = from_value_rowids(result, rowids, nrows)

Args
flat_values A potentially ragged tensor.
nested_value_rowids A list of 1-D integer tensors. The ith tensor is used as the value_rowids for the ith ragged dimension.
nested_nrows A list of integer scalars. The ith scalar is used as the nrows for the ith ragged dimension.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor (or flat_values if nested_value_rowids is empty).

Raises
ValueError If len(nested_values_rowids) != len(nested_nrows).

from_row_lengths

View source

Creates a RaggedTensor with rows partitioned by row_lengths.

The returned RaggedTensor corresponds with the python list defined by:

result = [[values.pop(0) for i in range(length)]
          for length in row_lengths]

Args
values A potentially ragged tensor with shape [nvals, ...].
row_lengths A 1-D integer tensor with shape [nrows]. Must be nonnegative. sum(row_lengths) must be nvals.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor. result.rank = values.rank + 1. result.ragged_rank = values.ragged_rank + 1.

Example:

>>> print(tf.RaggedTensor.from_row_lengths(
...     values=[3, 1, 4, 1, 5, 9, 2, 6],
...     row_lengths=[4, 0, 3, 1, 0]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []])>

from_row_limits

View source

Creates a RaggedTensor with rows partitioned by row_limits.

Equivalent to: from_row_splits(values, concat([0, row_limits])).

Args
values A potentially ragged tensor with shape [nvals, ...].
row_limits A 1-D integer tensor with shape [nrows]. Must be sorted in ascending order. If nrows>0, then row_limits[-1] must be nvals.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor. result.rank = values.rank + 1. result.ragged_rank = values.ragged_rank + 1.

Example:

>>> print(tf.RaggedTensor.from_row_limits(
...     values=[3, 1, 4, 1, 5, 9, 2, 6],
...     row_limits=[4, 4, 7, 8, 8]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>

from_row_splits

View source

Creates a RaggedTensor with rows partitioned by row_splits.

The returned RaggedTensor corresponds with the python list defined by:

result = [values[row_splits[i]:row_splits[i + 1]]
          for i in range(len(row_splits) - 1)]

Args
values A potentially ragged tensor with shape [nvals, ...].
row_splits A 1-D integer tensor with shape [nrows+1]. Must not be empty, and must be sorted in ascending order. row_splits[0] must be zero and row_splits[-1] must be nvals.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor. result.rank = values.rank + 1. result.ragged_rank = values.ragged_rank + 1.

Raises
ValueError If row_splits is an empty list.

Example:

>>> print(tf.RaggedTensor.from_row_splits(
...     values=[3, 1, 4, 1, 5, 9, 2, 6],
...     row_splits=[0, 4, 4, 7, 8, 8]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>

from_row_starts

View source

Creates a RaggedTensor with rows partitioned by row_starts.

Equivalent to: from_row_splits(values, concat([row_starts, nvals])).

Args
values A potentially ragged tensor with shape [nvals, ...].
row_starts A 1-D integer tensor with shape [nrows]. Must be nonnegative and sorted in ascending order. If nrows>0, then row_starts[0] must be zero.
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor. result.rank = values.rank + 1. result.ragged_rank = values.ragged_rank + 1.

Example:

>>> print(tf.RaggedTensor.from_row_starts(
...     values=[3, 1, 4, 1, 5, 9, 2, 6],
...     row_starts=[0, 4, 4, 7, 8]))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>

from_sparse

View source

Converts a 2D tf.SparseTensor to a RaggedTensor.

Each row of the output RaggedTensor will contain the explicit values from the same row in st_input. st_input must be ragged-right. If not it is not ragged-right, then an error will be generated.

Example:

st = SparseTensor(indices=[[0, 1], [0, 2], [0, 3], [1, 0], [3, 0]],
                  values=[1, 2, 3, 4, 5],
                  dense_shape=[4, 3])
rt.RaggedTensor.from_sparse(st).eval().tolist()
[[1, 2, 3], [4], [], [5]]

Currently, only two-dimensional SparseTensors are supported.

Args
st_input The sparse tensor to convert. Must have rank 2.
name A name prefix for the returned tensors (optional).
row_splits_dtype dtype for the returned RaggedTensor's row_splits tensor. One of tf.int32 or tf.int64.

Returns
A RaggedTensor with the same values as st_input. output.ragged_rank = rank(st_input) - 1. output.shape = [st_input.dense_shape[0], None].

Raises
ValueError If the number of dimensions in st_input is not known statically, or is not two.

from_tensor

View source

Converts a tf.Tensor into a RaggedTensor.

The set of absent/default values may be specified using a vector of lengths or a padding value (but not both). If lengths is specified, then the output tensor will satisfy output[row] = tensor[row][:lengths[row]]. If 'lengths' is a list of lists or tuple of lists, those lists will be used as nested row lengths. If padding is specified, then any row suffix consisting entirely of padding will be excluded from the returned RaggedTensor. If neither lengths nor padding is specified, then the returned RaggedTensor will have no absent/default values.

Examples:

dt = tf.constant([[5, 7, 0], [0, 3, 0], [6, 0, 0]])
tf.RaggedTensor.from_tensor(dt)
<tf.RaggedTensor [[5, 7, 0], [0, 3, 0], [6, 0, 0]]>
tf.RaggedTensor.from_tensor(dt, lengths=[1, 0, 3])
<tf.RaggedTensor [[5], [], [6, 0, 0]]>

tf.RaggedTensor.from_tensor(dt, padding=0)


<pre class="devsite-click-to-copy prettyprint lang-py">
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">dt = tf.constant([[[5, 0], [7, 0], [0, 0]],</code>
<code class="no-select nocode">                      [[0, 0], [3, 0], [0, 0]],</code>
<code class="no-select nocode">                      [[6, 0], [0, 0], [0, 0]]])</code>
<code class="devsite-terminal" data-terminal-prefix="&gt;&gt;&gt;">tf.RaggedTensor.from_tensor(dt, lengths=([2, 0, 3], [1, 1, 2, 0, 1]))</code>
<code class="no-select nocode">&lt;tf.RaggedTensor [[[5], [7]], [], [[6, 0], [], [0]]]&gt;</code>
</pre>

Args
tensor The Tensor to convert. Must have rank ragged_rank + 1 or higher.
lengths An optional set of row lengths, specified using a 1-D integer Tensor whose length is equal to tensor.shape[0] (the number of rows in tensor). If specified, then output[row] will contain tensor[row][:lengths[row]]. Negative lengths are treated as zero. You may optionally pass a list or tuple of lengths to this argument, which will be used as nested row lengths to construct a ragged tensor with multiple ragged dimensions.
padding An optional padding value. If specified, then any row suffix consisting entirely of padding will be excluded from the returned RaggedTensor. padding is a Tensor with the same dtype as tensor and with shape=tensor.shape[ragged_rank + 1:].
ragged_rank Integer specifying the ragged rank for the returned RaggedTensor. Must be greater than zero.
name A name prefix for the returned tensors (optional).
row_splits_dtype dtype for the returned RaggedTensor's row_splits tensor. One of tf.int32 or tf.int64.

Returns
A RaggedTensor with the specified ragged_rank. The shape of the returned ragged tensor is compatible with the shape of tensor.

Raises
ValueError If both lengths and padding are specified.

from_value_rowids

View source

Creates a RaggedTensor with rows partitioned by value_rowids.

The returned RaggedTensor corresponds with the python list defined by:

result = [[values[i] for i in range(len(values)) if value_rowids[i] == row]
          for row in range(nrows)]

Args
values A potentially ragged tensor with shape [nvals, ...].
value_rowids A 1-D integer tensor with shape [nvals], which corresponds one-to-one with values, and specifies each value's row index. Must be nonnegative, and must be sorted in ascending order.
nrows An integer scalar specifying the number of rows. This should be specified if the RaggedTensor may containing empty training rows. Must be greater than value_rowids[-1] (or zero if value_rowids is empty). Defaults to value_rowids[-1] (or zero if value_rowids is empty).
name A name prefix for the RaggedTensor (optional).
validate If true, then use assertions to check that the arguments form a valid RaggedTensor.

Returns
A RaggedTensor. result.rank = values.rank + 1. result.ragged_rank = values.ragged_rank + 1.

Raises
ValueError If nrows is incompatible with value_rowids.

Example:

>>> print(tf.RaggedTensor.from_value_rowids(
...     values=[3, 1, 4, 1, 5, 9, 2, 6],
...     value_rowids=[0, 0, 0, 0, 2, 2, 2, 3],
...     nrows=5))
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], []]>

nested_row_lengths

View source

Returns a tuple containing the row_lengths for all ragged dimensions.

rt.nested_row_lengths() is a tuple containing the row_lengths tensors for all ragged dimensions in rt, ordered from outermost to innermost.

Args
name A name prefix for the returned tensors (optional).

Returns
A tuple of 1-D integer Tensors. The length of the tuple is equal to self.ragged_rank.

nested_value_rowids

View source

Returns a tuple containing the value_rowids for all ragged dimensions.

rt.nested_value_rowids is a tuple containing the value_rowids tensors for all ragged dimensions in rt, ordered from outermost to innermost. In particular, rt.nested_value_rowids = (rt.value_rowids(),) + value_ids where:

* `value_ids = ()` if `rt.values` is a `Tensor`.
* `value_ids = rt.values.nested_value_rowids` otherwise.

Args
name A name prefix for the returned tensors (optional).

Returns
A tuple of 1-D integer Tensors.