Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


Optimizer that implements the Adam algorithm.

Inherits From: Optimizer

Migrate to TF2

tf.compat.v1.train.AdamOptimizer is compatible with eager mode and tf.function. When eager execution is enabled, learning_rate, beta1, beta2, and epsilon can each be a callable that takes no arguments and returns the actual value to use. This can be useful for changing these values across different invocations of optimizer functions.

To switch to native TF2 style, use tf.keras.optimizers.Adam instead. Please notice that due to the implementation differences, tf.keras.optimizers.Adam and tf.compat.v1.train.AdamOptimizer may have slight differences in floating point numerics even though the formula used for the variable updates still matches.

Structural Mapping to Native TF2


optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)


optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

How to Map Arguments

TF1 Arg Name TF2 Arg Name Note
learning_rate learning_rate Be careful of setting learning_rate as a tensor value computed from the global step. In TF1 this was usually meant to imply a dynamic learning rate and would recompute in each step. In TF2 (eager + function) it will treat it as a scalar value that only gets computed once instead of a symbolic placeholder to be computed each time.
beta1 beta_1
beta2 beta_2
epsilon epsilon Default value is 1e-08 in TF1, but 1e-07 in TF2.
use_locking N/A Not applicable in TF2.

Before & After Usage Example


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


Used in the notebooks

Used in the guide Used in the tutorials


Adam - A Method for Stochastic Optimization: Kingma et al., 2015 (pdf)

learning_rate A Tensor or a floating point value. The learning rate.
beta1 A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
beta2 A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
epsilon A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
use_locking If True use locks for update operations.
name Optional name for the operations created when applying gradients. Defaults to "Adam".



View source

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

grads_and_vars List of (gradient, variable) pairs as returned by compute_gradients().
global_step Optional Variable to increment by one after the variables have been updated.
name Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

TypeError If grads_and_vars is malformed.
ValueError If none of the variables have gradients.
RuntimeError If you should use _distributed_apply() instead.


View source