public class AdaGrad<Model: Differentiable>: Optimizer
    where Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions,
          Model.TangentVector.VectorSpaceScalar == Float

AdaGrad optimizer.

Individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the sum of all the historical squared values of the gradient.

Reference: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

  • Declaration

    public typealias Model = Model
  • The learning rate.

    Declaration

    public var learningRate: Float
  • rho

    The smoothing factor (ρ). Typical values are 0.5, 0.9, and 0.99, for smoothing over 2, 10, and 100 examples, respectively.

    Declaration

    public var rho: Float
  • A small scalar added to the denominator to improve numerical stability.

    Declaration

    public var epsilon: Float
  • The alpha values for all model differentiable variables.

    Declaration

    public var alpha: Model.TangentVector
  • Declaration

    public init(
        for model: __shared Model,
        learningRate: Float = 0.001,
        rho: Float = 0.9,
        epsilon: Float = 1e-8
    )
  • Declaration

    public func update(_ model: inout Model, along direction: Model.TangentVector)