AdaGrad

public class AdaGrad<Model: Differentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & PointwiseMultiplicative
    & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float

An AdaGrad optimizer.

Implements the AdaGrad (adaptive gradient) optimization algorithm. AdaGrad has parameter-specific learning rates, which are adapted relative to how frequently parameters gets updated during training. Parameters that receive more updates have smaller learning rates.

AdaGrad individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the running sum of squares of gradient norms.

Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (Duchi et al, 2011)

  • Declaration

    public typealias Model = Model
  • The learning rate.

    Declaration

    public var learningRate: Float
  • A small scalar added to the denominator to improve numerical stability.

    Declaration

    public var epsilon: Float
  • The running sum of squares of gradient norms.

    Declaration

    public var accumulator: Model.TangentVector
  • Creates an instance for model.

    Declaration

    public init(
      for model: __shared Model,
      learningRate: Float = 1e-3,
      initialAccumulatorValue: Float = 0.1,
      epsilon: Float = 1e-8
    )

    Parameters

    learningRate

    The learning rate. The default value is 1e-3.

    initialAccumulatorValue

    The starting value for the running sum of squares of gradient norms. The default value is 0.1.

    epsilon

    A small scalar added to the denominator to improve numerical stability. The default value is 1e-8.

  • Declaration

    public func update(_ model: inout Model, along direction: Model.TangentVector)
  • Declaration

    public required init(copying other: AdaGrad, to device: Device)