GeneralOptimizer

public class GeneralOptimizer<Model: EuclideanDifferentiable>: Optimizer
where
  Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable,
  Model.TangentVector.VectorSpaceScalar == Float

General optimizer that should be able to express multiple possible optimizations. The optimizer is composed of a mapping from ParameterGroup to ParameterGroupOptimizer. This optimizer also contains the number of elements working in a cross replica sum. This is for efficiency to prevent multiple inefficient iterations over the gradient.

  • Declaration

    public typealias Model = Model
  • The set of steps taken.

    Declaration

    public var step: Int
  • Used to determine the scaling factor of the cross replica sum.

    Declaration

    public var crossReplicaSumCount: Int?
  • global optimizer state.

    Declaration

    public var optimizerState: OptimizerState
  • Current device of the model. (Used for constructing hyperparameters)

    Declaration

    public var device: Device
  • An array mapping nested weight indices to parameter group optimizers? Weight i will be optimized by parameterGroups[parameterGroupIndices[i]]

    Declaration

    public var parameterGroupIndices: [Int]
  • An array of parameter group optimizers.

    Declaration

    public var parameterGroups: [ParameterGroupOptimizer]
  • Overall learning rate of the optimizer.

    Declaration

    public var learningRate: Float { get set }
  • Per-parameter group optimizer learning rates.

    Declaration

    public var learningRates: [Float] { get set }
  • Constructs an optimizer from a list of parameter group optimizers and a selector that divides the weights into different parameter groups. This is the most general constructor as there are many ways to construct this selector vector.

    Declaration

    public init(
      for model: __shared Model,
      _ kpPlan: TensorVisitorPlan<Model.TangentVector>,
      parameterGroupIndices: [Int],
      parameterGroups: [ParameterGroupOptimizer]
    )
  • Constructs an optimizer from a sequence of per-parameter group optimizers and then a final default parameter group optimizer. The [Bool] array is per weight and is true for the weights in that param group. The first parameterGroup will be used over subsequent ones.

    Declaration

    public convenience init(
      for model: __shared Model,
      _ kpPlan: TensorVisitorPlan<Model.TangentVector>,
      parameterGroups: ([Bool], ParameterGroupOptimizer)...,
      defaultOptimizer: ParameterGroupOptimizer
    )
  • The actual optimizer step. Maps over all the tensors of the gradient and applies per-weight optimizers defined by ParameterGroupOptimizer.

    Declaration

    public func update(_ model: inout Model, along direction: Model.TangentVector)
  • Copies the optimizer to the specified device.

    Declaration

    public required init(copying other: GeneralOptimizer, to device: Device)