|View source on GitHub|
Estimator.model_fn over GPUs. (deprecated)
tf.contrib.estimator.replicate_model_fn( model_fn, loss_reduction=losses.Reduction.SUM_BY_NONZERO_WEIGHTS, devices=None )
model_fn specifies a single forward pass of a model. To replicate
such a model over GPUs, each GPU gets its own instance of the forward pass
(a.k.a. a tower). The input features and labels get sharded into the chunks
that correspond to the number of GPUs. Each tower computes a loss based
on its input. For each such loss, gradients are computed. After that, the
available losses are aggregated to form aggregated loss. Available
gradients are summed. Then, they update weights using the specified
None, then all available GPUs are going to be used for
replication. If no GPUs are available, then the model is going to be
placed on the CPU.
Two modes of local replication over available GPUs are supported:
1) If exactly 1 GPU is detected, then variables and operations are placed onto the GPU. 2) If more than 1 GPU is detected, then variables are going to be placed on the CPU. Replicas of operations are placed on each individual GPU.
Here is an example of how one might use their
model_fn to run over GPUs:
... def model_fn(...): # See `model_fn` in `Estimator`. loss = ... optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001) optimizer = tf.contrib.estimator.TowerOptimizer(optimizer) if mode == tf.estimator.ModeKeys.TRAIN: # See the section below on <a href="/api_docs/python/tf/estimator/EstimatorSpec.md#train_op"><code>EstimatorSpec.train_op</code></a>. return EstimatorSpec(mode=mode, loss=loss, train_op=optimizer.minimize(loss)) # No change for <a href="/api_docs/python/tf/estimator/ModeKeys.md#EVAL"><code>ModeKeys.EVAL</code></a> or <a href="/api_docs/python/tf/estimator/ModeKeys.md#PREDICT"><code>ModeKeys.PREDICT</code></a>. return EstimatorSpec(...) ... classifier = tf.estimator.Estimator( model_fn=tf.contrib.estimator.replicate_model_fn(model_fn))
DNNClassifierIntegrationTest for an example with a canned
tf.estimator.GraphKeys.TRAIN. It is typically derived using an optimizer.
Towers are expected to populate it in the same way. Gradients from all towers
are reduced and applied in the last tower. To achieve that in the case of
TowerOptimizer needs to be used. See
On sharding input features and labels: Input features and labels are split for consumption by each tower. They are split across the dimension 0. Features and labels need to be batch major.
On reduction algorithms: Certain algorithms were chosen for aggregating results of computations on multiple towers:
- Losses from all towers are reduced according to
- Gradients from all towers are reduced according to
loss_reductionfor each trainable variable.
eval_metrics_opsare reduced per metric using
EstimatorSpec.export_outputsare reduced using concatenation.
- For all other fields of
EstimatorSpecthe values of the first tower are taken.
On distribution of variables: Variables are not duplicated between towers. Instead, they are placed on a single device as defined above and shared across towers.
If only one device is specified, then aggregation of loss and gradients
doesn't happen. Replication consists of placing
model_fn onto the
On current limitations:
predictionsare not supported for
ModeKeys.EVAL. They are required for
||controls whether losses are summed or averaged.|
Optional list of devices to replicate the model across. This
argument can be used to replicate only on the subset of available GPUs.
if there is no
A replicated version of the supplied