public class AdaMax<Model: Differentiable & KeyPathIterable>: Optimizer
where
Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
& KeyPathIterable,
Model.TangentVector.VectorSpaceScalar == Float
AdaMax optimizer.
A variant of Adam based on the infinity-norm.
Reference: Section 7 of “Adam - A Method for Stochastic Optimization”
-
Declaration
public typealias Model = Model
-
The learning rate.
Declaration
public var learningRate: Float
-
Decay rate used to estimate the first moment (mean) of gradients.
Declaration
public var beta1: Float
-
Decay rate used to estimate the exponentially weighted infinity norm.
Declaration
public var beta2: Float
-
A small scalar added to the denominator to improve numerical stability.
Declaration
public var epsilon: Float
-
The learning rate decay.
Declaration
public var decay: Float
-
The step count.
Declaration
public var step: Int
-
The first moments of the weights.
Declaration
public var firstMoments: Model.TangentVector
-
The exponentially weighted infinity norm of the weights.
Declaration
public var infinityNorm: Model.TangentVector
-
Note: The default parameters follow those provided in the paper.
Declaration
public init( for model: __shared Model, learningRate: Float = 0.002, beta1: Float = 0.9, beta2: Float = 0.999, epsilon: Float = 1e-8, decay: Float = 0 )
-
Declaration
public required init(copying other: AdaMax, to device: Device)