NCCL all-reduce implementation of CrossDeviceOps.

Inherits From: CrossDeviceOps

It uses Nvidia NCCL for all-reduce. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.

For reduces that are not all-reduce, it falls back to tf.distribute.ReductionToOneDevice.

Here is how you can use NcclAllReduce in tf.distribute.MirroredStrategy:

  strategy = tf.distribute.MirroredStrategy(

num_packs a non-negative integer. The number of packs to split values into. If zero, no pac