Run inference over pre-batched keyed inputs on multiple models.

This API is experimental and may change in the future.

Supports the same inference specs as RunInferencePerModel. Inputs must consist of a keyed list of examples, and outputs consist of keyed list of prediction logs corresponding by index.

examples A PCollection of keyed, batched inputs of type Example, SequenceExample, or bytes. Each type support inference specs corresponding to the unbatched cases described in RunInferencePerModel. Supports - PCollection[Tuple[K, List[Example]]] - PCollection[Tuple[K, List[SequenceExample]]] - PCollection[Tuple[K, List[Bytes]]]
inference_spec_types A flat iterable of Model inference endpoints. Inference will happen in a fused fashion (ie without data materialization), sequentially across Models within a Beam thread (but in parallel across threads and workers).
load_override_fn Optional function taking a model path and sequence of tags, and returning a tf SavedModel. The loaded model must be equivalent in interface to the model that would otherwise be loaded. It is up to the caller to ensure compatibility. This argument is experimental and subject to change.

A PCollection containing Tuples of a key and lists of batched prediction logs from each model provided in inference_spec_types. The Tuple of batched prediction logs is 1-1 aligned with inference_spec_types. The individual prediction logs in the batch are 1-1 aligned with the rows of data in the batch key.