tf.distribute.NcclAllReduce

NCCL all-reduce implementation of CrossDeviceOps.

Compat aliases for migration

tf.compat.v1.distribute.NcclAllReduce

tf.distribute.NcclAllReduce(
    num_packs=1
)

It uses Nvidia NCCL for all-reduce. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.

For reduces that are not all-reduce, it falls back to tf.distribute.ReductionToOneDevice.

Here is how you can use NcclAllReduce in tf.distribute.MirroredStrategy:

  strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.NcclAllReduce())

Args
`num_packs`	a non-negative integer. The number of packs to split values into. If zero, no packing will be done.

Raises
`ValueError`	if `num_packs` is negative.

Methods

batch_reduce(
    reduce_op, value_destination_pairs, options=None
)

Reduce values to destinations in batches.

See tf.distribute.StrategyExtended.batch_reduce_to. This can only be called in the cross-replica context.

Args
`reduce_op`	a `tf.distribute.ReduceOp` specifying how values should be combined.
`value_destination_pairs`	a sequence of (value, destinations) pairs. See `tf.distribute.CrossDeviceOps.reduce` for descriptions.
`options`	a `tf.distribute.experimental.CommunicationOptions`. See `tf.distribute.experimental.CommunicationOptions` for details.

Returns
A list of `tf.Tensor` or `tf.distribute.DistributedValues`, one per pair in `value_destination_pairs`.

Raises
`ValueError`	if `value_destination_pairs` is not an iterable of tuples of `tf.distribute.DistributedValues` and destinations.

broadcast(
    tensor, destinations
)

Broadcast tensor to destinations.

This can only be called in the cross-replica context.

Args
`tensor`	a `tf.Tensor` like object. The value to broadcast.
`destinations`	a `tf.distribute.DistributedValues`, a `tf.Variable`, a `tf.Tensor` alike object, or a device string. It specifies the devices to broadcast to. Note that if it's a `tf.Variable`, the value is broadcasted to the devices of that variable, this method doesn't update the variable.

Returns
A `tf.Tensor` or `tf.distribute.DistributedValues`.

reduce(
    reduce_op, per_replica_value, destinations, options=None
)

Reduce per_replica_value to destinations.

See tf.distribute.StrategyExtended.reduce_to. This can only be called in the cross-replica context.

Args
`reduce_op`	a `tf.distribute.ReduceOp` specifying how values should be combined.
`per_replica_value`	a `tf.distribute.DistributedValues`, or a `tf.Tensor` like object.
`destinations`	a `tf.distribute.DistributedValues`, a `tf.Variable`, a `tf.Tensor` alike object, or a device string. It specifies the devices to reduce to. To perform an all-reduce, pass the same to `value` and `destinations`. Note that if it's a `tf.Variable`, the value is reduced to the devices of that variable, and this method doesn't update the variable.
`options`	a `tf.distribute.experimental.CommunicationOptions`. See `tf.distribute.experimental.CommunicationOptions` for details.

Returns
A `tf.Tensor` or `tf.distribute.DistributedValues`.

Raises
`ValueError`	if per_replica_value can't be converted to a `tf.distribute.DistributedValues` or if destinations is not a string, `tf.Variable` or `tf.distribute.DistributedValues`.