tf.contrib.all_reduce.build_shuffle_all_reduce( input_tensors, gather_devices, red_op, un_op=None )
Construct a subgraph for shuffle all-reduce.
Shuffle reduce is essentially the algorithm implemented when using parameter servers. Suppose tensor length is n, there are d devices and g gather shards. Each device sends a n/g length sub-tensor to each gather shard. The gather shards perform a reduction across d fragments, then broadcast the result back to each device. The devices then join the g fully reduced fragments they receive from the shards. The gather shards could perform d-1 pairwise reductions, or one d-way reduction. The first is better where reduction Op time is low compared to transmission time, the second better in the other case.
input_tensors: list of T @(tf.Tensor} values to be reduced.
gather_devices: list of names of devices on which reduction shards should be placed.
red_op: an n-array elementwise reduction Op
un_op: optional elementwise unary Op to be applied to fully-reduced values.
list of T
tf.Tensor which are the fully reduced tensors.