|TensorFlow 1 version||View source on GitHub|
An multi-worker tf.distribute strategy with parameter servers.
tf.distribute.experimental.ParameterServerStrategy( cluster_resolver, variable_partitioner=None )
Used in the notebooks
|Used in the guide||Used in the tutorials|
Parameter server training is a common data-parallel method to scale up a machine learning model on multiple machines. A parameter server training cluster consists of workers and parameter servers. Variables are created on parameter servers and they are read and updated by workers in each step. By default, workers read and update these variables independently without synchronizing with each other. Under this configuration, it is known as asynchronous training.
In TensorFlow 2, we recommend an architecture based on central coordination
for parameter server training. Each worker and parameter server runs a
tf.distribute.Server, and on top of that, a coordinator task is responsible
for creating resources on workers and par