|TensorFlow 1 version||View source on GitHub|
Optimizer that implements the Adamax algorithm.
See Migration guide for more details.
tf.keras.optimizers.Adamax( learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, name='Adamax', **kwargs )
It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the paper. Adamax is sometimes superior to adam, specially in models with embeddings.
m = 0 # Initialize initial 1st moment vector v = 0 # Initialize the exponentially weighted infinity norm t = 0 # Initialize timestep
The update rule for parameter
w with gradient
described at the end of section 7.1 of the paper:
t += 1 m = beta1 * m + (1 - beta) * g v = max(beta2 * v, abs(g)) current_lr = learning_rate / (1 - beta1 ** t) w = w - current_lr * m / (v + epsilon)
Adam, the epsilon is added for numerical stability
(especially to get rid of division by zero when
v_t == 0).
In contrast to