Module: tf.data.experimental.service

API for using the tf.data service.

This module contains:

  1. tf.data server implementations for running the tf.data service.
  2. A distribute dataset transformation that moves a dataset's preprocessing to happen in the tf.data service.

The tf.data service offers a way to improve training speed when the host attached to a training device can't keep up with the data consumption of the model. For example, suppose a host can generate 100 examples/second, but the model can process 200 examples/second. Training speed could be doubled by using the tf.data service to generate 200 examples/second.

Before using the tf.data service

There are a few things to do before using the tf.data service to speed up training.

Understand processing_mode

The tf.data service uses a cluster of workers to prepare data for training your model. The processing_mode argument to tf.data.experimental.service.distribute describes how to leverage multiple workers to process the input dataset. Currently, there are two processing modes to choose from: "distributed_epoch" and "parallel_epochs".

"distributed_epoch" means that the dataset will be split across all tf.data service workers. The dispatcher produces "splits" for the dataset and sends them to workers for further processing. For example, if a dataset begins with a list of filenames, the dispatcher will iterate through the filenames and send the filenames to tf.data workers, which will perform the rest of the dataset transformations on those files. "distributed_epoch" is useful when your model needs to see each element of the dataset exactly once, or if it needs to see the data in a generally-sequential order. "distributed_epoch" only works for datasets with splittable sources, such as Dataset.from_tensor_slices, Dataset.list_files, or Dataset.range.

"parallel_epochs" means that the entire input dataset will be processed independently by each of the tf.data service workers. For this reason, it is important to shuffle data (e.g. filenames) non-deterministically, so that each worker will process the elements of the dataset in a different order. "parallel_epochs"