tf.contrib.distribute.MultiWorkerAllReduce

View source on GitHub

All-reduce algorithms for distributed TensorFlow.

Inherits From: AllReduceCrossDeviceOps

worker_devices a list of device strings for workers participating in all-reduce.
num_gpus_per_worker number of GPU devices per worker.
all_reduce_spec a tuple or a named tuple or a list of tuples specifying the all-reduce algorithm.

  1. The first element of a tuple is the name of the all-reduce algorithm. Valid algorithm names are: "nccl", "nccl/xring", "nccl/rechd", "nccl/pscpu", "xring", "pscpu", "psgpu", "pscpu/pscpu". Algorithms with a "/" are hierarchical, so two all-reduces are executed, the first one aggregates tensors within a worker and the second aggregates across workers.
  2. The second element of a tuple is the number of shards when doing all-reduce. Let's say its values is M, each tensor after packing will be split into M shards and then M parallel all-reduces would be performed before finally they are concatenated backed into a complete tensor.
  3. The third element is the maximum size of tensors that will be applicable for the algorithm specified by the first element. For example, if all_reduce_spec=[("nccl", 2, 1024), ("pscpu/pscpu", 2, -1)], tensors with size not larger than 1024 bytes will be applied a 2-shard "nccl" all-reduce and other tensors will be applied a 2-shard "pscpu/pscpu" algorithm. The third elements should be in increasing order across tuples and end with -1 which indicates infinity.
num_packs see AllReduceCrossDeviceOps.
agg_small_grads_max_bytes see AllReduceCrossDeviceOps.
agg_small_grads_max_group see AllReduceCrossDeviceOps.

Methods

batch_reduce

View source

Reduce PerReplica objects in a batch.

Reduce each first element in value_destination_pairs to each second element which indicates the destinations.

Args
reduce_op Indicates how per_replica_value will be reduced. Accepted values are tf.distribute.ReduceOp.SUM, tf.distribute.ReduceOp.MEAN.
value_destination_pairs a list or a tuple of tuples of PerReplica objects (or tensors with device set if there is one device) and destinations.

Returns
a list of Mirrored objects.

Raises
ValueError if value_destination_pairs is not a list or a tuple of tuples of PerReplica objects and destinations

broadcast

View source

Broadcast the tensor to destinations.

Args
tensor the tensor to broadcast.
destinations the broadcast destinations.

Returns
a Mirrored object.

reduce

View source

Reduce per_replica_value to destinations.

It runs the reduction operation defined by reduce_op and put the result on destinations.

Args
reduce_op Indicates how per_replica_value will be reduced. Accepted values are tf.distribute.ReduceOp.SUM, tf.distribute.ReduceOp.MEAN.
per_replica_value a PerReplica object or a tensor with device set.
destinations the reduction destinations.

Returns
a Mirrored object.

Raises
ValueError if per_replica_value can't be converted to a PerReplica object.