|TensorFlow 1 version|
Library for running a computation across multiple devices.
The intent of this library is that you can write an algorithm in a stylized way
and it will be usable with a variety of different
implementations. Each descendant will implement a different strategy for
distributing the algorithm across multiple devices/machines. Furthermore, these
changes can be hidden inside the specific layers and other library classes that
need special treatment to run in a distributed setting, so that most users'
model definition code can run unchanged. The
tf.distribute.Strategy API works
the same way with eager and graph execution.
The tutorials cover how to use
tf.distribute.Strategyto do distributed training with native Keras APIs, custom training loops, and Estimator APIs. They also cover how to save/load model when using
- Data parallelism is where we run multiple copies of the model on different slices of the input data. This is in contrast to model parallelism where we divide up a single copy of a model across multiple devices. Note: we only support data parallelism for now, but hope t