ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

tf.distribute.Strategy

A state & compute distribution policy on a list of devices.

See the guide for overview and examples. See tf.distribute.StrategyExtended and tf.distribute for a glossary of concepts mentioned on this page such as "per-replica", replica, and reduce.

In short:

A custom training loop can be as simple as:

with my_strategy.scope():
  @tf.function
  def distribute_train_epoch(dataset):
    def replica_fn(input):
      # process input and return result
      return result

    total_result = 0
    for x in dataset:
      per_replica_result = my_strategy.run(replica_fn, args=(x,))
      total_result += my_strategy.reduce(tf.distribute.ReduceOp.SUM,
                                         per_replica_result, axis=None)
    return total_result

  dist_dataset = my_strategy.experimental_distribute_dataset(dataset)
  for _ in range(EPOCHS):
    train_result = distribute_train_epoch(dist_dataset)

This takes an ordinary dataset and replica_fn and runs it distributed using a particular