text.span_alignment

Return an alignment from a set of source spans to a set of target spans.

The source and target spans are specified using B+1 dimensional tensors, with B>=0 batch dimensions followed by a final dimension that lists the span offsets for each span in the batch:

  • The ith source span in batch b1...bB starts at source_start[b1...bB, i] (inclusive), and extends to just before source_limit[b1...bB, i] (exclusive).
  • The jth target span in batch b1...bB starts at target_start[b1...bB, j] (inclusive), and extends to just before target_limit[b1...bB, j] (exclusive).

result[b1...bB, i] contains the index (or indices) of the target span that overlaps with the ith source span in batch b1...bB. The multivalent_result parameter indicates whether the result should contain a single span that aligns with the source span, or all spans that align with the source span.

  • If multivalent_result is false (the default), then result[b1...bB, i]=j indicates that the jth target span overlaps with the ith source span in batch b1...bB. If no target spans overlap with the ith target span, then result[b1...bB, i]=-1.

  • If multivalent_result is true, then result[b1...bB, i, n]=j indicates that the jth target span is the nth span that overlaps with the ith source span in in batch b1...bB.

For a definition of span overlap, see the docstring for span_overlaps().

Examples:

Given the following source and target spans (with no batch dimensions):

#         0    5    10   15   20   25   30   35   40   45   50   55   60
#         |====|====|====|====|====|====|====|====|====|====|====|====|
# Source: [-0-]     [-1-] [2] [3]    [4][-5-][-6-][-7-][-8-][-9-]
# Target: [-0-][-1-]     [-2-][-3-][-4-] [5] [6]    [7]  [-8-][-9-][10]
#         |====|====|====|====|====|====|====|====|====|====|====|====|
source_starts = [0, 10, 16, 20, 27, 30, 35, 40, 45, 50]
source_limits = [5, 15, 19, 23, 30, 35, 40, 45, 50, 55]
target_starts = [0,  5, 15, 20, 25, 31, 35, 42, 47, 52, 57]
target_limits = [5, 10, 20, 25, 30, 34, 38, 45, 52, 57, 61]
span_alignment(source_starts, source_limits, target_starts, target_limits)
<tf.Tensor: shape=(10,), dtype=int64,
    numpy=array([ 0, -1, -1, -1, -1, -1, -1, -1, -1, -1])>
span_alignment(source_starts, source_limits, target_starts, target_limits,
               multivalent_result=True)
<tf.RaggedTensor [[0], [], [], [], [], [], [], [], [], []]>
span_alignment(source_starts, source_limits, target_starts, target_limits,
               contains=True)
<tf.Tensor: shape=(10,), dtype=int64,
    numpy=array([ 0, -1, -1, -1, -1,  5,  6,  7, -1, -1])>
span_alignment(source_starts, source_limits, target_starts, target_limits,
                partial_overlap=True, multivalent_result=True)
<tf.RaggedTensor [[0], [], [2], [3], [4], [5], [6], [7], [8], [8, 9]]>

source_start A B+1 dimensional potentially ragged tensor with shape [D1...DB, source_size]: the start offset of each source span.
source_limit A B+1 dimensional potentially ragged tensor with shape [D1...DB, source_size]: the limit offset of each source span.
target_start A B+1 dimensional potentially ragged tensor with shape [D1...DB, target_size]: the start offset of each target span.
target_limit A B+1 dimensional potentially ragged tensor with shape [D1...DB, target_size]: the limit offset of each target span.
contains If true, then a source span is considered to overlap a target span when the source span contains the target span.
contained_by If true, then a source span is considered to overlap a target span when the source span is contained by the target span.
partial_overlap If true, then a source span is considered to overlap a target span when the source span partially overlaps the target span.
multivalent_result Whether the result should contain a single target span index (if multivalent_result=False) or a list of target span indices (if multivalent_result=True) for each source span.
name A name for the operation (optional).

An int64 tensor with values in the range: -1 <= result < target_size. If multivalent_result=False, then the returned tensor has shape [source_size], where source_size is the length of the source_start and source_limit input tensors. If multivalent_result=True, then the returned tensor has shape `[source_size, (num_aligned_target_spans)].