Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


Optimizer that implements the RMSProp algorithm (Tielemans et al.

Inherits From: Optimizer

Migrate to TF2

tf.compat.v1.train.RMSPropOptimizer is compatible with eager mode and tf.function. When eager execution is enabled, learning_rate, decay, momentum, and epsilon can each be a callable that takes no arguments and returns the actual value to use. This can be useful for changing these values across different invocations of optimizer functions.

To switch to native TF2 style, use tf.keras.optimizers.RMSprop instead. Please notice that due to the implementation differences, tf.keras.optimizers.RMSprop and tf.compat.v1.train.RMSPropOptimizer may have slight differences in floating point numerics even though the formula used for the variable updates still matches.

Structural mapping to native TF2


optimizer = tf.compat.v1.train.RMSPropOptimizer(


optimizer = tf.keras.optimizers.RMSprop(

How to map arguments

TF1 Arg Name TF2 Arg Name Note
learning_rate learning_rate Be careful of setting learning_rate tensor value computed from the global step. In TF1 this was usually meant to imply a dynamic learning rate and would recompute in each step. In TF2 (eager + function) it will treat it as a scalar value that only gets computed once instead of a symbolic placeholder to be computed each time.
decay rho -
momentum momentum -
epsilon epsilon Default value is 1e-10 in TF1, but 1e-07 in TF2.
use_locking - Not applicable in TF2.

Before & after usage example


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.RMSPropOptimizer(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


Used in the notebooks

Used in the tutorials



Coursera slide 29: Hinton, 2012 (pdf)

learning_rate A Tensor or a floating point value. The learning rate.
decay Discounting factor for the history/coming gradient
momentum A scalar tensor.
epsilon Small value to avoid zero denominator.
use_locking If True use locks for update operation.
centered If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.
name Optional name prefix for the operations created when applying gradients. Defaults to "RMSProp".



View source

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

grads_and_vars List of (gradient, variable) pairs as returned by compute_gradients().
global_step Optional Variable to increment by one after the variables have been updated.
name Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

TypeError If grads_and_vars is malformed.
ValueError If none of the variables have gradients.
RuntimeError If you should use _distributed_apply() instead.


View source

Compute gradients of loss for the variables in var_list.

This is the first part of minimize(). It returns a list of (gradient, variable) pairs where "gradient" is the gradient for "variable". Note that "gradient" can be a Tensor, an IndexedSlices, or None if there is no gradien