ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more


API for using the service.

This module contains:

  1. server implementations for running the service.
  2. APIs for registering datasets with the service and reading from the registered datasets.

The service provides the following benefits:

  • Horizontal scaling of input pipeline processing to solve input bottlenecks.
  • Data coordination for distributed training. Coordinated reads enable all replicas to train on similar-length examples across each global training step, improving step times in synchronous training.
  • Dynamic balancing of data across training replicas.
dispatcher =
dispatcher_address ="://")[1]
worker =
dataset =
dataset = dataset.apply(
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


This section goes over how to set up the service.

Run servers

The service consists of one dispatch server and n worker servers. servers should be brought up alongside your training jobs, then brought down when the jobs are finished. Use to start a dispatch server, and to start worker servers. Servers can be run in the same process for testing purposes, or scaled up on separate machines.

See for an example of using Google Kubernetes Engine (GKE) to manage the service. Note that the server implementation in is not GKE-specific, and can be used to run the service in other contexts.

Custom ops

If your dataset uses custom ops, these ops need to be made available to servers by calling load_op_library from the dispatcher and worker processes at startup.


Users interact with service by programmatically registering their datasets with service, then creating datasets that read from the registered datasets. The register_dataset function registers a dataset, then the from_dataset_id function creates a new dataset which reads from the registered dataset. The distribute function wraps register_dataset and from_dataset_id into a single convenient transformation which registers its input dataset and then reads from it. distribute enables service to be used with a one-line code change. However, it assumes that the dataset is created and consumed by the same entity and this assumption might not always be valid or desirable. In particular, in certain scenarios, such as distributed training, it might be desirable to decouple the creation and consumption of the dataset (via register_dataset and from_dataset_id respectively) to avoid having to create the dataset on each of the training workers.



To use the distribute transformation, apply the transformation after the prefix of your input pipeline that you would like to be executed using service (typically at the end).

dataset = ...  # Define your dataset here.
# Move dataset processing from the local machine to the service
dataset = dataset.apply(