ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more


Hierarchical copy all-reduce implementation of CrossDeviceOps.

Inherits From: CrossDeviceOps

Used in the notebooks

Used in the guide

It reduces to one GPU along edges in some hierarchy and broadcasts back to each GPU along the same path. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.

This is a reduction created for Nvidia DGX-1 which assumes GPUs connects like that on DGX-1 machine. If you have different GPU inter-connections, it is likely that it would be slower than tf.distribute.ReductionToOneDevice.

For reduces that are not all-reduce, it falls back to tf.distribute.ReductionToOneDevice.

Here is how you can use HierarchicalCopyAllReduce in tf.distribute.MirroredStrategy:

  strategy = tf.distribute.MirroredStrategy(

num_packs a non-negative integer. The number of packs to split values into. If zero, no packing will be done.

ValueError if num_packs is negative.



View source

Reduce values to destinations in batches.

See tf.distribute.StrategyExtended.batch_reduce_to. This can only be called in the cross-replica context.

reduce_op a tf.distribute.ReduceOp specifying how values should be combined.
value_destination_pairs a seq