ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more


Optimizer that implements the Adam algorithm.

Inherits From: Optimizer

Used in the notebooks

Used in the guide Used in the tutorials


Adam - A Method for Stochastic Optimization: Kingma et al., 2015 (pdf)

learning_rate A Tensor or a floating point value. The learning rate.
beta1 A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
beta2 A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
epsilon A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
use_locking If True use locks for update operations.
name Optional name for the operations created when applying gradients. Defaults to "Adam".



View source

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

grads_and_vars List of (gradient, variable) pairs as returned by compute_gradients().
global_step Optional Variable to increment by one after the variables have been updated.
name Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

TypeError If grads_and_vars is malformed.
ValueError If none of the variables have gradients.
RuntimeError If you should use _distributed_apply() instead.


View source

Compute gradients of loss for the variables in var_list.

This is the first part of minimize(). It returns a list of (gradient, variable) pairs where "gradient" i