Optimizer that implements the Adam algorithm.

Inherits From: `Optimizer`

### Used in the notebooks

Used in the guide Used in the tutorials

#### References:

Adam - A Method for Stochastic Optimization: Kingma et al., 2015 (pdf)

`learning_rate` A Tensor or a floating point value. The learning rate.
`beta1` A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
`beta2` A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
`epsilon` A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
`use_locking` If True use locks for update operations.
`name` Optional name for the operations created when applying gradients. Defaults to "Adam".

## Methods

### `apply_gradients`

View source

This is the second part of `minimize()`. It returns an `Operation` that applies gradients.

Args
`grads_and_vars` List of (gradient, variable) pairs as returned by `compute_gradients()`.
`global_step` Optional `Variable` to increment by one after the variables have been updated.
`name` Optional name for the returned operation. Default to the name passed to the `Optimizer` constructor.

Returns
An `Operation` that applies the specified gradients. If `global_step` was not None, that operation also increments `global_step`.

Raises
`TypeError` If `grads_and_vars` is malformed.
`ValueError` If none of the variables have gradients.
`RuntimeError` If you should use `_distributed_apply()` instead.

### `compute_gradients`

View source

Compute gradients of `loss` for the variables in `var_list`.

This is the first part of