Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


Optimizer that implements the Adadelta algorithm.

Inherits From: Optimizer

Migrate to TF2

tf.compat.v1.train.AdadeltaOptimizer is compatible with eager mode and tf.function. When eager execution is enabled, learning_rate, rho, and epsilon can each be a callable that takes no arguments and returns the actual value to use. This can be useful for changing these values across different invocations of optimizer functions.

To switch to native TF2 style, use tf.keras.optimizers.Adadelta instead. Please notice that due to the implementation differences, tf.keras.optimizers.Adadelta and tf.compat.v1.train.AdadeltaOptimizer may have slight differences in floating point numerics even though the formula used for the variable updates still matches.

Structural mapping to native TF2


optimizer = tf.compat.v1.train.AdadeltaOptimizer(


optimizer = tf.keras.optimizers.Adadelta(

How to map arguments

TF1 Arg Name TF2 Arg Name Note
learning_rate learning_rate Be careful of setting learning_rate tensor value computed from the global step. In TF1 this was usually meant to imply a dynamic learning rate and would recompute in each step. In TF2 (eager + function) it will treat it as a scalar value that only gets computed once instead of a symbolic placeholder to be computed each time.
rho rho -
epsilon epsilon Default value is 1e-08 in TF1, but 1e-07 in TF2.
use_locking - Not applicable in TF2.

Before & after usage example


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdadeltaOptimizer(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.keras.optimizers.Adadelta(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))



ADADELTA - An Adaptive Learning Rate Method: Zeiler, 2012 (pdf)

learning_rate A Tensor or a floating point value. The learning rate. To match the exact form in the original paper use 1.0.
rho A Tensor or a floating point value. The decay rate.
epsilon A Tensor or a floating point value. A constant epsilon used to better conditioning the grad update.
use_locking If True use locks for update operations.
name Optional name prefix for the operations created when applying gradients. Defaults to "Adadelta".



View source

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

grads_and_vars List of (gradient, variable) pairs as returned by compute_gradients().
global_step Optional Variable to increment by one after the variables have been updated.
name Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

TypeError If grads_and_vars is malformed.
ValueError If none of the variables have gradients.
RuntimeError If you should use _distributed_apply() instead.


View source

Compute gradients of loss for the variables in var_li