Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge


Optimizer that implements the Adagrad algorithm.

Inherits From: Optimizer

Migrate to TF2

tf.compat.v1.train.AdagradOptimizer is compatible with eager mode and tf.function. When eager execution is enabled, learning_rate, initial_accumulator_value, and epsilon can each be a callable that takes no arguments and returns the actual value to use. This can be useful for changing these values across different invocations of optimizer functions.

To switch to native TF2 style, use tf.keras.optimizers.Adagrad instead. Please notice that due to the implementation differences, tf.keras.optimizers.Adagrad and tf.compat.v1.train.AdagradOptimizer may have slight differences in floating point numerics even though the formula used for the variable updates still matches.

Structural mapping to native TF2


optimizer = tf.compat.v1.train.AdagradOptimizer(


optimizer = tf.keras.optimizers.Adagrad(

How to map arguments

TF1 Arg Name TF2 Arg Name Note
learning_rate learning_rate Be careful of setting learning_rate tensor value computed from the global step. In TF1 this was usually meant to imply a dynamic learning rate and would recompute in each step. In TF2 (eager + function) it will treat it as a scalar value that only gets computed once instead of a symbolic placeholder to be computed each time.
initial_accumulator_value initial_accumulator_value The argument can be value of zero in TF2, which is not accepted in TF1.|
- epsilon epsilon is become configurable in TF2. The defualt value is changed from 1e-8 to 1e-7
use_locking - Not applicable in TF2.

Before & after usage example


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdagradOptimizer(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.001)
optimizer.apply_gradients(zip([grad], [x]))


Used in the notebooks

Used in the guide


Adaptive Subgradient Methods for Online Learning and Stochastic Optimization :Duchi et al., 2011 (pdf)

learning_rate A Tensor or a floating point value. The learning rate.
initial_accumulator_value A floating point value. Starting value for the accumulators, must be positive.
use_locking If True use locks for update operations.
name Optional name prefix for the operations created when applying gradients. Defaults to "Adagrad".

ValueError If the initial_accumulator_value is invalid.



View source

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

grads_and_vars List of (gradient, variable) pairs as returned by compute_gradients().
global_step Optional Variable to increment by one after the variables have been updated.
name Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

TypeError If grads_and_vars is malformed.
ValueError If none of the variables have gradients.
RuntimeError If you should use _distributed_apply() instead.


View source

Compute gradients of loss for the variables in var_list.

This is the first part of minimize(). It returns a list of (gradient, variable) p