tf.keras.optimizers.Adamax

Optimizer that implements the Adamax algorithm.

Inherits From: Optimizer

It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the paper. Adamax is sometimes superior to adam, specially in models with embeddings.

Initialization:

m = 0  # Initialize initial 1st moment vector
v = 0  # Initialize the exponentially weighted infinity norm
t = 0  # Initialize timestep

The update rule for parameter w with gradient g is described at the end of section 7.1 of the paper:

t += 1
m = beta1 * m + (1 - beta) * g
v = max(beta2 * v, abs(g))
current_lr = learning_rate / (1 - beta1 ** t)
w = w - current_lr * m / (v + epsilon)

Similarly to Adam, the epsilon is added for numerical stability (especially to get rid of division by zero when v_t == 0).

In contrast to Adam,