public class AdaGrad<Model: Differentiable>: Optimizer
    where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions,
          Model.TangentVector.VectorSpaceScalar == Float

AdaGrad optimizer.

Individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the sum of all the historical squared values of the gradient.

Reference: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

  • Declaration

    public typealias Model = Model
  • The learning rate.


    public var learningRate: Float
  • rho

    The smoothing factor (ρ). Typical values are 0.5, 0.9, and 0.99, for smoothing over 2, 10, and 100 examples, respectively.


    public var rho: Float
  • A small scalar added to the denominator to improve numerical stability.


    public var epsilon: Float
  • The alpha values for all model differentiable variables.


    public var alpha: Model.TangentVector
  • Declaration

    public init(
        for model: __shared Model,
        learningRate: Float = 0.001,
        rho: Float = 0.9,
        epsilon: Float = 1e-8
  • Declaration

    public func update(_ model: inout Model, along direction: Model.TangentVector)