|TensorFlow 1 version||View source on GitHub|
Decorator to define a function with a custom gradient.
Compat aliases for migration
See Migration guide for more details.
tf.custom_gradient( f=None )
This decorator allows fine grained control over the gradients of a sequence for operations. This may be useful for multiple reasons, including providing a more efficient or numerically stable gradient for a sequence of operations.
For example, consider the following function that commonly occurs in the computation of cross entropy and log likelihoods:
def log1pexp(x): return tf.math.log(1 + tf.exp(x))
Due to numerical instability, the gradient of this function evaluated at x=100 is NaN. For example:
x = tf.constant(100.) y = log1pexp(x) dy = tf.gradients(y, x) # Will be NaN when evaluated.
The gradient expression can be analytically simplified to provide numerical stability:
@tf.custom_gradient def log1pexp(x): e = tf.exp(x) def grad(dy): return dy * (1 - 1 / (1 + e)) return tf.math.log(1 + e), grad
With this definition, the gradient at x=100 will be correctly evaluated as 1.0.
dy is defined as the upstream gradient. i.e. the gradient from
all the layers or functions originating from this layer.
By chain rule we know that
dy/dx = dy/dx_0 * dx_0/dx_1 * ... * dx_i/dx_i+1 * ... * dx_n/dx
In this case the gradient of our current function defined as
dx_i/dx_i+1 = (1 - 1 / (1 + e)). The upstream gradient
dy would be
dx_i+1/dx_i+2 * dx_i+2/dx_i+3 * ... * dx_n/dx. The upstream gradient
multiplied by the current gradient is then passed downstream.
In case the function takes multiple variables as input, the
function must also return the same number of variables.
We take the function
z = x * y as an example.
def bar(x, y):
dz_dx = y
dz_dy = x
return upstream * dz_dx, upstream * dz_dy
z = x * y
return z, grad
x = tf.constant(2.0, dtype=tf.float32)
y = tf.constant(3.0, dtype=tf.float32)
with tf.GradientTape(persistent=True) as tape:
z = bar(x, y)