Decorator to define a function with a custom gradient.

This decorator allows fine grained control over the gradients of a sequence for operations. This may be useful for multiple reasons, including providing a more efficient or numerically stable gradient for a sequence of operations.

For example, consider the following function that commonly occurs in the computation of cross entropy and log likelihoods:

def log1pexp(x):
  return tf.math.log(1 + tf.exp(x))

Due to numerical instability, the gradient of this function evaluated at x=100 is NaN. For example:

x = tf.constant(100.)
y = log1pexp(x)
dy = tf.gradients(y, x) # Will be NaN when evaluated.

The gradient expression can be analytically simplified to provide numerical stability:

def log1pexp(x):
  e = tf.exp(x)
  def grad(dy):
    return dy * (1 - 1 / (1 + e))
  return tf.math.log(1 + e), grad

With this definition, the gradient at x=100 will be correctly evaluated as 1.0.

Nesting custom gradients can lead to unintuitive results. The default behavior does not correspond to n-th order derivatives. For example

def op(x):
  y = op1(x)
  def grad_fn(dy):
    gdy = op2(x, y, dy)
    def grad_grad_fn(ddy):  # Not the 2nd order gradient of op w.r.t. x.
      return op3(x, y, dy, ddy)
    return gdy, grad_grad_fn
  return y, grad_fn

The function grad_grad_fn will be calculating the first order gradient of grad_fn with respect to dy, which is used to generate forward-mode gradient graphs from backward-mode gradient graphs, but is not the same as the second order gradient of op with respect to x.

Instead, wrap nested @tf.custom_gradients in ano