The Adam optimizer is based on Adam: A Method for Stochastic Optimization

The update rule given learning rate lr, epsilon eps, accumulator acc, preconditioner s, iteration t, weights w and gradients g is:

acc = beta_1 * acc + (1 - beta_1) * g
s = beta_2 * s + (1 - beta_2) * g**2
normalized_lr = lr * sqrt(1 - beta_2**t) / (1 - beta_1**t)
w = w - normalized_lr * acc / (sqrt(s) + eps)

learning_rate A positive float for learning rate.
beta_1 A float between 0.0 and 1.0 for the decay used to track the previous gradients.
beta_2 A float between 0.0 and 1.0 for the decay used to track the magnitude (second moment) of previous gradients.
epsilon A small non-negative float, used to maintain numerical stability.

[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]