All-reduce algorithms for distributed TensorFlow.
Inherits From: AllReduceCrossDeviceOps
tf.contrib.distribute.MultiWorkerAllReduce(
worker_devices, num_gpus_per_worker, all_reduce_spec=('pscpu/pscpu', 2, -1),
num_packs=0, agg_small_grads_max_bytes=0, agg_small_grads_max_group=10
)
Args |
worker_devices
|
a list of device strings for workers participating in
all-reduce.
|
num_gpus_per_worker
|
number of GPU devices per worker.
|
all_reduce_spec
|
a tuple or a named tuple or a list of tuples specifying
the all-reduce algorithm.
- The first element of a tuple is the name of the all-reduce algorithm.
Valid algorithm names are: "nccl", "nccl/xring", "nccl/rechd",
"nccl/pscpu", "xring", "pscpu", "psgpu", "pscpu/pscpu". Algorithms with
a "/" are hierarchical, so two all-reduces are executed, the first one
aggregates tensors within a worker and the second aggregates across
workers.
- The second element of a tuple is the number of shards when doing
all-reduce. Let's say its values is M, each tensor after packing will be
split into M shards and then M parallel all-reduces would be performed
before finally they are concatenated backed into a complete tensor.
- The third element is the maximum size of tensors that will be
applicable for the algorithm specified by the first element. For
example, if all_reduce_spec=[("nccl", 2, 1024), ("pscpu/pscpu", 2, -1)],
tensors with size not larger than 1024 bytes will be applied a 2-shard
"nccl" all-reduce and other tensors will be applied a 2-shard
"pscpu/pscpu" algorithm. The third elements should be in increasing
order across tuples and end with -1 which indicates infinity.
|
num_packs
|
see AllReduceCrossDeviceOps.
|
agg_small_grads_max_bytes
|
see AllReduceCrossDeviceOps.
|
agg_small_grads_max_group
|
see AllReduceCrossDeviceOps.
|
Methods
batch_reduce
View source
batch_reduce(
reduce_op, value_destination_pairs
)
Reduce PerReplica objects in a batch.
Reduce each first element in value_destination_pairs
to each second
element which indicates the destinations.
Args |
reduce_op
|
Indicates how per_replica_value will be reduced. Accepted
values are tf.distribute.ReduceOp.SUM , tf.distribute.ReduceOp.MEAN .
|
value_destination_pairs
|
a list or a tuple of tuples of PerReplica objects
(or tensors with device set if there is one device) and destinations.
|
Returns |
a list of Mirrored objects.
|
Raises |
ValueError
|
if value_destination_pairs is not a list or a tuple of
tuples of PerReplica objects and destinations
|
broadcast
View source
broadcast(
tensor, destinations
)
Broadcast the tensor
to destinations.
Args |
tensor
|
the tensor to broadcast.
|
destinations
|
the broadcast destinations.
|
Returns |
a Mirrored object.
|
reduce
View source
reduce(
reduce_op, per_replica_value, destinations
)
Reduce per_replica_value
to destinations
.
It runs the reduction operation defined by reduce_op
and put the
result on destinations
.
Returns |
a Mirrored object.
|
Raises |
ValueError
|
if per_replica_value can't be converted to a PerReplica
object.
|