tfr.data.parse_from_sequence_example

Parses SequenceExample to feature maps.

The FixedLenFeature in example_feature_spec is converted to FixedLenSequenceFeature to parse feature_list in SequenceExample. We keep track of the non-trivial default_values (e.g., -1 for labels) for features in example_feature_spec and use them to replace the parsing defaults of the SequenceExample (i.e., 0 for numbers and "" for strings). Due to this complexity, we only allow scalar non-trivial default values for numbers.

When list_size is None, the 2nd dim of the output Tensors are not fixed and vary from batch to batch. When list_size is specified as a positive integer, truncation or padding is applied so that the 2nd dim of the output Tensors is the specified list_size.

Example:

serialized = [
  sequence_example {
    context {
      feature {
        key: "query_length"
        value { int64_list { value: 3 } }
      }
    }
    feature_lists {
      feature_list {
        key: "unigrams"
        value {
          feature { bytes_list { value: "tensorflow" } }
          feature { bytes_list { value: ["learning" "to" "rank"] } }
        }
      }
      feature_list {
        key: "utility"
        value {
          feature { float_list { value: 0.0 } }
          feature { float_list { value: 1.0 } }
        }
      }
    }
  }
  sequence_example {
    context {
      feature {
        key: "query_length"
        value { int64_list { value: 2 } }
      }
    }
    feature_lists {
      feature_list {
        key: "unigrams"
        value {
          feature { bytes_list { value: "gbdt" } }
          feature { }
        }
      }
      feature_list {
        key: "utility"
        value {
          feature { float_list { value: 0.0 } }
          feature { float_list { value: 0.0 } }
        }
      }
    }
  }
]

We can use arguments:

context_feature_spec: {
  "query_length": tf.io.FixedLenFeature([1], dtypes.int64)
}
example_feature_spec: {
  "unigrams": tf.io.VarLenFeature(dtypes.string),
  "utility": tf.io.FixedLenFeature([1], dtypes.float32,
    default_value=[0.])
}

And the expected output is:

{
  "unigrams": SparseTensor(
    indices=array([[0, 0, 0], [0, 1, 0], [0, 1, 1], [0, 1, 2], [1, 0, 0], [1,
    1, 0], [1, 1, 1]]),
    values=["tensorflow", "learning", "to", "rank", "gbdt"],
    dense_shape=array([2, 2, 3])),
  "utility": [[[ 0.], [ 1.]], [[ 0.], [ 0.]]],
  "query_length": [[3], [2]],
}

serialized (Tensor) A string Tensor for a batch of serialized SequenceExample.
list_size (int) The number of frames to keep for a SequenceExample. If specified, truncation or padding may happen. Otherwise, the output Tensors have a dynamic list size.
context_feature_spec (dict) A mapping from feature keys to FixedLenFeature or VarLenFeature values for context.
example_feature_spec (dict) A mapping from feature keys to FixedLenFeature or VarLenFeature values for the list of examples. These features are stored in the feature_lists field in SequenceExample. FixedLenFeature is translated to FixedLenSequenceFeature to parse SequenceExample. Note that no missing value in the middle of a feature_list is allowed for frames.
size_feature_name (str) Name of feature for example list sizes. Populates the feature dictionary with a tf.int32 Tensor of shape [batch_size] for this feature name. If None, which is default, this feature is not generated.
mask_feature_name (str) Name of feature for example list masks. Populates the feature dictionary with a tf.bool Tensor of shape [batch_size, list_size] for this feature name. If None, which is default, this feature is not generated.
shuffle_examples (bool) A boolean to indicate whether examples within a list are shuffled before the list is trimmed down to list_size elements (when list has more than list_size elements).
seed (int) A seed passed onto random_ops.uniform() to shuffle examples.

A mapping from feature keys to Tensor or SparseTensor.