An AdaGrad optimizer.
Implements the AdaGrad (adaptive gradient) optimization algorithm. AdaGrad has parameter-specific learning rates, which are adapted relative to how frequently parameters gets updated during training. Parameters that receive more updates have smaller learning rates.
AdaGrad individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the running sum of squares of gradient norms.
Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (Duchi et al, 2011)
public typealias Model = Model
The learning rate.
public var learningRate: Float
A small scalar added to the denominator to improve numerical stability.
public var epsilon: Float
The running sum of squares of gradient norms.
public var accumulator: Model.TangentVector
Creates an instance for
public init( for model: __shared Model, learningRate: Float = 1e-3, initialAccumulatorValue: Float = 0.1, epsilon: Float = 1e-8 )
The learning rate. The default value is
The starting value for the running sum of squares of gradient norms. The default value is
A small scalar added to the denominator to improve numerical stability. The default value is
public required init(copying other: AdaGrad, to device: Device)