|View source on GitHub|
tff.learning.optimizers.Optimizer for Yogi.
tff.learning.optimizers.build_yogi( learning_rate: optimizer.Float, beta_1: optimizer.Float = 0.9, beta_2: optimizer.Float = 0.999, epsilon: optimizer.Float = 0.001, initial_preconditioner_value: optimizer.Float = 1e-06 ) ->
The Yogi optimizer is based on Adaptive methods for nonconvex optimization
The update rule given learning rate
w and gradients
acc = beta_1 * acc + (1 - beta_1) * g s = s + (1 - beta_2) * sign(g - s) * (g ** 2) normalized_lr = lr * sqrt(1 - beta_2**t) / (1 - beta_1**t) w = w - normalized_lr * acc / (sqrt(s) + eps)
Implementation of Yogi is based on additive updates, as opposed to multiplicative updates (as in Adam). Experiments show better performance across NLP and Vision tasks both in centralized and federated settings.
Typically use 10x the learning rate used for Adam.