Optimizer that implements the Adam algorithm.

Inherits From: `Optimizer`

### Used in the notebooks

Used in the guide Used in the tutorials

#### References:

Adam - A Method for Stochastic Optimization: Kingma et al., 2015 (pdf)

`learning_rate` A Tensor or a floating point value. The learning rate.
`beta1` A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
`beta2` A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
`epsilon` A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
`use_locking` If True use locks for update operations.
`name` Optional name for the operations created when applying gradients. Defaults to "Adam".

## Methods

### `apply_gradients`

View source

This is the second part of `minimize()`. It returns an `Operation` that applies gradients.

Args
`grads_and_vars` List of (gradient, variable) pairs as returned by `compute_gradients()`.
`global_step` Optional `Variable` to increment by one after the variables have been updated.
`name` Optional name for the returned operation. Default to the name passed to the `Optimizer` constructor.

Returns
An `Operation` that applies the specified gradients. If `global_step` was not None, that operation also increments `global_step`.

Raises
`TypeError` If `grads_and_vars` is malformed.
`ValueError` If none of the variables have gradients.
`RuntimeError` If you should use `_distributed_apply()` instead.

### `compute_gradients`

View source

Compute gradients of `loss` for the variables in `var_list`.

This is the first part of `minimize()`. It returns a list of (gradient, variable) pairs where "gradient" is the gradient for "variable". Note that "gradient" can be a `Tensor`, an `IndexedSlices`, or `None` if there is no gradient for the given variable.

Args
`loss` A Tensor containing the value to minimize or a callable taking no arguments which returns the value to minimize. When eager execution is enabled it must be a callable.
`var_list` Optional list or tuple of `tf.Variable` to update to minimize `loss`. Defaults to the list of variables collected in the graph under the key `GraphKeys.TRAINABLE_VARIABLES`.
`gate_gradients` How to gate the computation of gradients. Can be `GATE_NONE`, `GATE_OP`, or `GATE_GRAPH`.
`aggregation_method` Specifies the method used to combine gradient terms. Valid values are defined in the class `AggregationMethod`.
`colocate_gradients_with_ops` If True, try colocating gradients with the corresponding op.
`grad_loss` Optional. A `Tensor` holding the gradient computed for `loss`.

Returns
A list of (gradient, variable) pairs. Variable is always present, but gradient can be `None`.