Sharp edges in Differentiable Swift

Differentiable Swift has come a long way in terms of usability. Here is a heads-up about the parts that are still a little un-obvious. As progress continues, this guide will become smaller and smaller, and you'll be able to write differentiable code without needing special syntax.

Loops

Loops are differentiable, there's just one detail to know about. When you write the loop, wrap the bit where you specify what you're looping over in withoutDerivative(at:)

var a: [Float] = [1,2,3]

for example:

for _ in a.indices 
{}

becomes

for _ in withoutDerivative(at: a.indices) 
{}

or:

for _ in 0..<a.count 
{}

becomes

for _ in 0..<withoutDerivative(at: a.count) 
{}

This is necessary because the Array.count member doesn't contribute to the derivative with respect to the array. Only the actual elements in the array contribute to the derivative.

If you've got a loop where you manually use an integer as the upper bound, there's no need to use withoutDerivative(at:):

let iterations: Int = 10
for _ in 0..<iterations {} //this is fine as-is.

Map and Reduce

map and reduce have special differentiable versions that work exactly like what you're used to:

a = [1,2,3]
let aPlusOne = a.differentiableMap {$0 + 1}
let aSum = a.differentiableReduce(0, +)
print("aPlusOne", aPlusOne)
print("aSum", aSum)
aPlusOne [2.0, 3.0, 4.0]
aSum 6.0

Array subscript sets

Array subscript sets (array[0] = 0) aren't differentiable out of the box, but you can paste this extension:

extension Array where Element: Differentiable {
    @differentiable(where Element: Differentiable)
    mutating func updated(at index: Int, with newValue: Element) {
        self[index] = newValue
    }

    @derivative(of: updated)
    mutating func vjpUpdated(at index: Int, with newValue: Element)
      -> (value: Void, pullback: (inout TangentVector) -> (Element.TangentVector))
    {
        self.updated(at: index, with: newValue)
        return ((), { v in
            let dElement = v[index]
            v.base[index] = .zero
            return dElement
        })
    }
}

and then the workaround syntax is like this:

var b: [Float] = [1,2,3]

instead of this:

b[0] = 17

write this:

b.updated(at: 0, with: 17)

Let's make sure it works:

func plusOne(array: [Float]) -> Float{
  var array = array
  array.updated(at: 0, with: array[0] + 1)
  return array[0]
}

let plusOneValAndGrad = valueWithGradient(at: [2], in: plusOne)
print(plusOneValAndGrad)
(value: 3.0, gradient: [1.0])

The error you'll get without this workaround is Differentiation of coroutine calls is not yet supported. Here is the link to see progress on making this workaround unnecessary: https://bugs.swift.org/browse/TF-1277 (it talks about Array.subscript._modify, which is what's called behind the scenes when you do an array subscript set).

Float <-> Double conversions

If you're switching between Float and Double, their constructors aren't already differentiable. Here's a function that will let you go from a Float to a Double differentiably.

(Switch Float and Double in the below code, and you've got a function that converts from Double to Float.)

You can make similar converters for any other real Numeric types.

@differentiable
func convertToDouble(_ a: Float) -> Double {
    return Double(a)
}

@derivative(of: convertToDouble)
func convertToDoubleVJP(_ a: Float) -> (value: Double, pullback: (Double) -> Float) {
    func pullback(_ v: Double) -> Float{
        return Float(v)
    }
    return (value: Double(a), pullback: pullback)
}

Here's an example usage:

@differentiable
func timesTwo(a: Float) -> Double {
  return convertToDouble(a * 2)
}
let input: Float = 3
let valAndGrad = valueWithGradient(at: input, in: timesTwo)
print("grad", valAndGrad.gradient)
print("type of input:", type(of: input))
print("type of output:", type(of: valAndGrad.value))
print("type of gradient:", type(of: valAndGrad.gradient))
grad 2.0
type of input: Float
type of output: Double
type of gradient: Float

Transcendental and other functions (sin, cos, abs, max)

A lot of transcendentals and other common built-in functions have already been made differentiable for Float and Double. There are fewer for Double than Float. Some aren't available for either. So here are a few manual derivative definitions to give you the idea of how to make what you need, in case it isn't already provided:

pow (see link for derivative explanation)

import Foundation

@usableFromInline
@derivative(of: pow) 
func powVJP(_ base: Double, _ exponent: Double) -> (value: Double, pullback: (Double) -> (Double, Double)) {
    let output: Double = pow(base, exponent)
    func pullback(_ vector: Double) -> (Double, Double) {
        let baseDerivative = vector * (exponent * pow(base, exponent - 1))
        let exponentDerivative = vector * output * log(base)
        return (baseDerivative, exponentDerivative)
    }

    return (value: output, pullback: pullback)
}

max

@usableFromInline
@derivative(of: max)
func maxVJP<T: Comparable & Differentiable>(_ x: T, _ y: T) -> (value: T, pullback: (T.TangentVector)
  -> (T.TangentVector, T.TangentVector))
{
    func pullback(_ v: T.TangentVector) -> (T.TangentVector, T.TangentVector) {
        if x < y {
            return (.zero, v)
        } else {
            return (v, .zero)
        }
    }
    return (value: max(x, y), pullback: pullback)
}

abs

@usableFromInline
@derivative(of: abs)
func absVJP<T: Comparable & SignedNumeric & Differentiable>(_ x: T)
  -> (value: T, pullback: (T.TangentVector) -> T.TangentVector)
{
    func pullback(_ v: T.TangentVector) -> T.TangentVector{
        if x < 0 {
            return .zero - v
        }
        else {
            return v
        }
    }
    return (value: abs(x), pullback: pullback)
}

sqrt (see link for derivative explanation)

@usableFromInline
@derivative(of: sqrt) 
func sqrtVJP(_ x: Double) -> (value: Double, pullback: (Double) -> Double) {
    let output = sqrt(x)
    func pullback(_ v: Double) -> Double {
        return v / (2 * output)
    }
    return (value: output, pullback: pullback)
}

Let's check that these work:

let powGrad = gradient(at: 2, 2, in: pow)
print("pow gradient: ", powGrad, "which is", powGrad == (4.0, 2.772588722239781) ? "correct" : "incorrect")

let maxGrad = gradient(at: 1, 2, in: max)
print("max gradient: ", maxGrad, "which is", maxGrad == (0.0, 1.0) ? "correct" : "incorrect")

let absGrad = gradient(at: 2, in: abs)
print("abs gradient: ", absGrad, "which is", absGrad == 1.0 ? "correct" : "incorrect")

let sqrtGrad = gradient(at: 4, in: sqrt)
print("sqrt gradient: ", sqrtGrad, "which is", sqrtGrad == 0.25 ? "correct" : "incorrect")
pow gradient:  (4.0, 2.772588722239781) which is correct
max gradient:  (0.0, 1.0) which is correct
abs gradient:  1.0 which is correct
sqrt gradient:  0.25 which is correct

The compiler error that alerts you to the need for something like this is: Expression is not differentiable. Cannot differentiate functions that have not been marked '@differentiable' and that are defined in other files

KeyPath subscripting

KeyPath subscripting (get or set) doesn't work out of the box, but once again, there are some extensions you can add, and then use a workaround syntax. Here it is:

https://github.com/tensorflow/swift/issues/530#issuecomment-687400701

This workaround is a little uglier than the others. It only works for custom objects, which must conform to Differentiable and AdditiveArithmetic. You have to add a .tmp member and a .read() function, and you use the .tmp member as intermediate storage when doing KeyPath subscript gets (there is an example in the linked code). KeyPath subscript sets work pretty simply with a .write() function.