Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

Classes

The following classes are available globally.

  • A mutable, shareable, owning reference to a tensor.

    Declaration

    public final class Parameter<Scalar> where Scalar : TensorFlowScalar
    extension Parameter: CopyableToDevice
  • Class wrapping a C pointer to a TensorHandle. This class owns the TensorHandle and is responsible for destroying it.

    Declaration

    public class TFETensorHandle : _AnyTensorHandle
    extension TFETensorHandle: Equatable
  • RMSProp optimizer.

    It is recommended to leave the parameters of this optimizer at their default values (except for the learning rate, which can be freely tuned). This optimizer is usually a good choice for recurrent neural networks.

    Reference: “rmsprop: Divide the gradient by a running average of its recent magnitude”

    Declaration

    public class RMSProp<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • AdaGrad optimizer.

    Individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the sum of all the historical squared values of the gradient.

    Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”

    Declaration

    public class AdaGrad<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • ADADELTA optimizer.

    ADADELTA is a more robust extension of AdaGrad. ADADELTA adapts learning rates based on a moving window of gradient updates rather than by accumulating all past gradient norms. It can thus adapt faster to changing dynamics of the optimization problem space.

    Reference: “ADADELTA: An Adaptive Learning Rate Method”

    Declaration

    public class AdaDelta<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • Adam optimizer.

    Implements the Adam optimization algorithm. Adam is a stochastic gradient descent method that computes individual adaptive learning rates for different parameters from estimates of first- and second-order moments of the gradients.

    Reference: “Adam: A Method for Stochastic Optimization” (Kingma and Ba, 2014).

    Examples:

    • Train a simple reinforcement learning agent:
    ...
    // Instantiate an agent's policy - approximated by the neural network (`net`) after defining it 
    in advance.
    var net = Net(observationSize: Int(observationSize), hiddenSize: hiddenSize, actionCount: actionCount)
    // Define the Adam optimizer for the network with a learning rate set to 0.01.
    let optimizer = Adam(for: net, learningRate: 0.01)
    ...
    // Begin training the agent (over a certain number of episodes).
    while true {
    ...
        // Implementing the gradient descent with the Adam optimizer:
        // Define the gradients (use withLearningPhase to call a closure under a learning phase).
        let gradients = withLearningPhase(.training) {
            TensorFlow.gradient(at: net) { net -> Tensor<Float> in
                // Return a softmax (loss) function
                return loss = softmaxCrossEntropy(logits: net(input), probabilities: target)
            }
        }
        // Update the differentiable variables of the network (`net`) along the gradients with the Adam 
    optimizer.
        optimizer.update(&net, along: gradients)
        ...
        }
    }
    
    • Train a generative adversarial network (GAN):
    ...
    // Instantiate the generator and the discriminator networks after defining them.
    var generator = Generator()
    var discriminator = Discriminator()
    // Define the Adam optimizers for each network with a learning rate set to 2e-4 and beta1 - to 0.5.
    let adamOptimizerG = Adam(for: generator, learningRate: 2e-4, beta1: 0.5)
    let adamOptimizerD = Adam(for: discriminator, learningRate: 2e-4, beta1: 0.5)
    ...
    Start the training loop over a certain number of epochs (`epochCount`).
    for epoch in 1...epochCount {
        // Start the training phase.
        ...
        for batch in trainingShuffled.batched(batchSize) {
            // Implementing the gradient descent with the Adam optimizer:
            // 1) Update the generator.
            ...
            let 𝛁generator = TensorFlow.gradient(at: generator) { generator -> Tensor<Float> in
                ...
                return loss
                }
            // Update the differentiable variables of the generator along the gradients (`𝛁generator`) 
            // with the Adam optimizer.
            adamOptimizerG.update(&generator, along: 𝛁generator)
    
            // 2) Update the discriminator.
            ...
            let 𝛁discriminator = TensorFlow.gradient(at: discriminator) { discriminator -> Tensor<Float> in
                ...
                return loss
            }
            // Update the differentiable variables of the discriminator along the gradients (`𝛁discriminator`) 
            // with the Adam optimizer.
            adamOptimizerD.update(&discriminator, along: 𝛁discriminator)
            }
    }       
    

    Declaration

    public class Adam<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative
        & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float

    Parameters

    learningRate

    A Float. The learning rate (default value: 1e-3).

    beta1

    A Float. The exponentian decay rate for the 1st moment estimates (default value: 0.9).

    beta2

    A Float. The exponentian decay rate for the 2nd moment estimates (default value: 0.999).

    epsilon

    A Float. A small scalar added to the denominator to improve numerical stability (default value: 1e-8).

    decay

    A Float. The learning rate decay (default value: 0).

  • AdaMax optimizer.

    A variant of Adam based on the infinity-norm.

    Reference: Section 7 of “Adam - A Method for Stochastic Optimization”

    Declaration

    public class AdaMax<Model: Differentiable & KeyPathIterable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
        & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • AMSGrad optimizer.

    This algorithm is a modification of Adam with better convergence properties when close to local optima.

    Reference: “On the Convergence of Adam and Beyond”

    Declaration

    public class AMSGrad<Model: Differentiable & KeyPathIterable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
        & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • RAdam optimizer.

    Rectified Adam, a variant of Adam that introduces a term to rectify the adaptive learning rate variance.

    Reference: [“On the Variance of the Adaptive Learning Rate and Beyond”] https://arxiv.org/pdf/1908.03265.pdf

    Declaration

    public class RAdam<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
        & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • SGD

    Stochastic gradient descent (SGD) optimizer.

    An optimizer that implements stochastic gradient descent, with support for momentum, learning rate decay, and Nesterov momentum.

    Declaration

    public class SGD<Model: Differentiable>: Optimizer
    where
      Model.TangentVector: VectorProtocol & ElementaryFunctions & KeyPathIterable,
      Model.TangentVector.VectorSpaceScalar == Float
  • A TensorFlow checkpoint file reader.

    Declaration

    public class TensorFlowCheckpointReader