tfio.experimental.columnar.parse_avro

View source on GitHub

Parses avro records into a dict of tensors.

This op parses serialized avro records into a dictionary mapping keys to Tensor, and SparseTensor objects. features is a dict from keys to VarLenFeature, SparseFeature, RaggedFeature, and FixedLenFeature objects. Each VarLenFeature and SparseFeature is mapped to a SparseTensor; each FixedLenFeature is mapped to a Tensor.

Each VarLenFeature maps to a SparseTensor of the specified type representing a ragged matrix. Its indices are [batch, index] where batch identifies the example in serialized, and index is the value's index in the list of values associated with that feature and example.

Each SparseFeature maps to a SparseTensor of the specified type representing a Tensor of dense_shape [batch_size] + SparseFeature.size. Its values come from the feature in the examples with key value_key. A values[i] comes from a position k in the feature of an example at batch entry batch. This positional information is recorded in indices[i] as [batch, index_0, index_1, ...] where index_j is the k-th value of the feature in the example at with key SparseFeature.index_key[j]. In other words, we split the indices (except the first index indicating the batch entry) of a SparseTensor by dimension into different features of the avro record. Due to its complexity a VarLenFeature should be preferred over a SparseFeature whenever possible.

Each FixedLenFeature df maps to a Tensor of the specified type (or tf.float32 if not specified) and shape (serialized.size(),) + df.shape. FixedLenFeature entries with a default_value are optional. With no default value, we will fail if that Feature is missing from any example in serialized.

Use this within the dataset.map(parser_fn=parse_avro).

Only works for batched serialized input!

serialized The batched, serialized string tensors.
reader_schema The reader schema. Note, this MUST match the reader schema from the avro_record_dataset. Otherwise, this op will segfault!
features A map of feature names mapped to feature information.
avro_names (Optional.) may contain descriptive names for the corresponding serialized avro parts. These may be useful for debugging purposes, but they have no effect on the output. If not None, avro_names must be the same length as serialized.
name The name of the op.

A map of feature names to tensors.