tf.distribute.experimental.CommunicationOptions

Options for cross device communications like All-reduce.

Used in the notebooks

Used in the guide

This can be passed to methods like tf.distribute.get_replica_context().all_reduce() to optimize collective operation performance. Note that these are only hints, which may or may not change the actual behavior. Some options only apply to certain strategy and are ignored by others.

One common optimization is to break gradients all-reduce into multiple packs so that weight updates can overlap with gradient all-reduce.

Examples:

options = tf.distribute.experimental.CommunicationOptions(
    bytes_per_pack=50 * 1024 * 1024,
    timeout_seconds=120,
    implementation=tf.distribute.experimental.CommunicationImplementation.NCCL
)
grads = tf.distribute.get_replica_context().all_reduce(
    'sum', grads, options=options)
optimizer.apply_gradients(zip(grads, vars),
    experimental_aggregate_gradients=False)

bytes_per_pack a non-negative integer. Breaks collective operations into packs of certain size. If it's zero, the value is determined automatically. This only applies to all-reduce with MultiWorkerMirroredStrategy currently.