|TensorFlow 1 version||View source on GitHub|
Hierarchical copy all-reduce implementation of CrossDeviceOps.
Compat aliases for migration
See Migration guide for more details.
tf.distribute.HierarchicalCopyAllReduce( num_packs=1 )
Used in the notebooks
|Used in the guide|
It reduces to one GPU along edges in some hierarchy and broadcasts back to each GPU along the same path. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.
This is a reduction created for Nvidia DGX-1 which assumes GPUs connects like
that on DGX-1 machine. If you have different GPU inter-connections, it is
likely that it would be slower than
For reduces that are not all-reduce, it falls back to
Here is how you can use
strategy = tf.distribute.MirroredStrategy( cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
||a non-negative integer. The number of packs to split values into. If zero, no packing will be done.|
batch_reduce( reduce_op, value_destination_pairs, options=None )
Reduce values to destinations in batches.
tf.distribute.StrategyExtended.batch_reduce_to. This can only be
called in the cross-replica context.