API for using the tf.data service.
This module contains:
- tf.data server implementations for running the tf.data service.
distributedataset transformation that moves a dataset's preprocessing to happen in the tf.data service.
The tf.data service offers a way to improve training speed when the host attached to a training device can't keep up with the data consumption of the model. For example, suppose a host can generate 100 examples/second, but the model can process 200 examples/second. Training speed could be doubled by using the tf.data service to generate 200 examples/second.
Before using the tf.data service
There are a few things to do before using the tf.data service to speed up training.
The tf.data service uses a cluster of workers to prepare data for training your
processing_mode argument to
tf.data.experimental.service.distribute describes how to leverage multiple
workers to process the input dataset. Currently, there are two processing modes
to choose from: "distributed_epoch" and "parallel_epochs".
"distributed_epoch" means that the dataset will be split across all tf.data
service workers. The dispatcher produces "splits" for the dataset and sends them
to workers for further processing. For example, if a dataset begins with a list
of filenames, the dispatcher will iterate through the filenames and send the
filenames to tf.data workers, which will perform the rest of the dataset
transformations on those files. "distributed_epoch" is useful when your model
needs to see each element of the dataset exactly once, or if it needs to see the
data in a generally-sequential order. "distributed_epoch" only works for
datasets with splittable sources, such as
"parallel_epochs" means that the entire input dataset will be processed independently by each of the tf.data service workers. For this reason, it is important to shuffle data (e.g. filenames) non-deterministically, so that each worker will process the elements of the dataset in a different order. "parallel_epochs"