NCCL all-reduce implementation of CrossDeviceOps.

Inherits From: CrossDeviceOps

It uses Nvidia NCCL for all-reduce. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.

For reduces that are not all-reduce, it falls back to tf.distribute.ReductionToOneDevice.

Here is how you can use NcclAllReduce in tf.distribute.MirroredStrategy:

  strategy = tf.distribute.MirroredStrategy(

num_packs a non-negative integer. The number of packs to split values into. If zero, no packing will be done.

ValueError if num_packs is negative.



View source

Reduce values to destinations in batches.

See tf.distribute.StrategyExtended.batch_reduce_to. This can only be called in the cross-replica context.

reduce_op a tf.distribute.ReduceOp specifying how values should be combined.
value_destination_pairs a sequence of (value, destinations) pairs. See tf.distribute.CrossDeviceOps.reduce for descriptions.
options a tf.distribute.experimental.CommunicationOptions. See tf.distribute.experimental.CommunicationOptions for details.

A list of tf.Tensor or tf.distribute.DistributedValues, one per pair in value_destination_pairs.

ValueError if value_destination_pairs is not an iterable of tuples of tf.distribute.DistributedValues and destinations.


View source

Broadcast tensor to destinations.

This can only be called in the cross-replica context.