tf.distribute.DistributedDataset

Represents a dataset distributed among devices and machines.

A tf.distribute.DistributedDataset could be thought of as a "distributed" dataset. When you use tf.distribute API to scale training to multiple devices or machines, you also need to distribute the input data, which leads to a tf.distribute.DistributedDataset instance, instead of a tf.data.Dataset instance in the non-distributed case. In TF 2.x, tf.distribute.DistributedDataset objects are Python iterables.

There are two APIs to create a tf.distribute.DistributedDataset object: tf.distribute.Strategy.experimental_distribute_dataset(dataset)and tf.distribute.Strategy.distribute_datasets_from_function(dataset_fn). When to use which? When you have a tf.data.Dataset instance, and the regular batch splitting (i.e. re-batch the input tf.data.Dataset instance with a new batch size that is equal to the global batch size divided by the number of replicas in sync) and autosharding (i.e. the