tf.contrib.distribute.MultiWorkerAllReduce

All-reduce algorithms for distributed TensorFlow.

Inherits From: AllReduceCrossDeviceOps

tf.contrib.distribute.MultiWorkerAllReduce(
    worker_devices, num_gpus_per_worker, all_reduce_spec=('pscpu/pscpu', 2, -1),
    num_packs=0, agg_small_grads_max_bytes=0, agg_small_grads_max_group=10
)

Args
`worker_devices`	a list of device strings for workers participating in all-reduce.
`num_gpus_per_worker`	number of GPU devices per worker.
`all_reduce_spec`	a tuple or a named tuple or a list of tuples specifying the all-reduce algorithm. The first element of a tuple is the name of the all-reduce algorithm. Valid algorithm names are: "nccl", "nccl/xring", "nccl/rechd", "nccl/pscpu", "xring", "pscpu", "psgpu", "pscpu/pscpu". Algorithms with a "/" are hierarchical, so two all-reduces are executed, the first one aggregates tensors within a worker and the second aggregates across workers. The second element of a tuple is the number of shards when doing all-reduce. Let's say its values is M, each tensor after packing will be split into M shards and then M parallel all-reduces would be performed before finally they are concatenated backed into a complete tensor. The third element is the maximum size of tensors that will be applicable for the algorithm specified by the first element. For example, if all_reduce_spec=[("nccl", 2, 1024), ("pscpu/pscpu", 2, -1)], tensors with size not larger than 1024 bytes will be applied a 2-shard "nccl" all-reduce and other tensors will be applied a 2-shard "pscpu/pscpu" algorithm. The third elements should be in increasing order across tuples and end with -1 which indicates infinity.
`num_packs`	see AllReduceCrossDeviceOps.
`agg_small_grads_max_bytes`	see AllReduceCrossDeviceOps.
`agg_small_grads_max_group`	see AllReduceCrossDeviceOps.

Methods

`batch_reduce`

View source

batch_reduce(
    reduce_op, value_destination_pairs
)

Reduce PerReplica objects in a batch.

Reduce each first element in value_destination_pairs to each second element which indicates the destinations.

Args
`reduce_op`	Indicates how per_replica_value will be reduced. Accepted values are `tf.distribute.ReduceOp.SUM`, `tf.distribute.ReduceOp.MEAN`.
`value_destination_pairs`	a list or a tuple of tuples of PerReplica objects (or tensors with device set if there is one device) and destinations.

Returns
a list of Mirrored objects.

Raises
`ValueError`	if `value_destination_pairs` is not a list or a tuple of tuples of PerReplica objects and destinations

`broadcast`

View source

broadcast(
    tensor, destinations
)

Broadcast the tensor to destinations.

Args
`tensor`	the tensor to broadcast.
`destinations`	the broadcast destinations.

Returns
a Mirrored object.

`reduce`

View source

reduce(
    reduce_op, per_replica_value, destinations
)

Reduce per_replica_value to destinations.

It runs the reduction operation defined by reduce_op and put the result on destinations.

Args
`reduce_op`	Indicates how per_replica_value will be reduced. Accepted values are `tf.distribute.ReduceOp.SUM`, `tf.distribute.ReduceOp.MEAN`.
`per_replica_value`	a PerReplica object or a tensor with device set.
`destinations`	the reduction destinations.

Returns
a Mirrored object.

Raises
`ValueError`	if per_replica_value can't be converted to a PerReplica object.

tf.contrib.distribute.MultiWorkerAllReduce

Args

Methods

batch_reduce

broadcast

reduce

`batch_reduce`

`broadcast`

`reduce`