|TensorFlow 1 version||View source on GitHub|
Decorator to define a function with a custom gradient.
This decorator allows fine grained control over the gradients of a sequence for operations. This may be useful for multiple reasons, including providing a more efficient or numerically stable gradient for a sequence of operations.
For example, consider the following function that commonly occurs in the computation of cross entropy and log likelihoods:
def log1pexp(x): return tf.math.log(1 + tf.exp(x))
Due to numerical instability, the gradient this function evaluated at x=100 is NaN. For example:
x = tf.constant(100.) y = log1pexp(x) dy = tf.gradients(y, x) # Will be NaN when evaluated.
The gradient expression can be analytically simplified to provide numerical stability:
@tf.custom_gradient def log1pexp(x): e = tf.exp(x) def grad(dy): return dy * (1 - 1 / (1 + e)) return tf.math.log(1 + e), grad
With this definition, the gradient at x=100 will be correctly evaluated as 1.0.
tf.RegisterGradient which registers a gradient function for a
primitive TensorFlow operation.
tf.custom_gradient on the other hand allows
for fine grained control over the gradient computation of a sequence of
Note that if the decorated function uses
Variables, the enclosing variable
scope must be using
f(*x)that returns a tuple
xis a sequence of
Tensorinputs to the function.
Tensoror sequence of
Tensoroutputs of applying TensorFlow operations in
grad_fnis a function with the signature
g(*grad_ys)which returns a list of
Tensors - the derivatives of
ywith respect to the
Tensoror sequence of
Tensors the same size as
yholding the initial value gradients for each
y. In a pure mathematical sense, a vector-argument vector-valued function
f's derivatives should be its Jacobian matrix
J. Here we are expressing the Jacobian
Jas a function
grad_fnwhich defines how
Jwill transform a vector
grad_yswhen left-multiplied with it (
grad_ys * J). This functional representation of a matrix is convenient to use for chain-rule calculation (in e.g. the back-propagation algorithm).
Variables (that are not part of the inputs), i.e. through
grad_fnshould have signature
g(*grad_ys, variables=None), where
variablesis a list of the
Variables, and return a 2-tuple
(grad_xs, grad_vars), where
grad_xsis the same as above, and
list<Tensor>with the derivatives of
ywith respect to the variables (that is, grad_vars has one Tensor per variable in variables).
h(x) which returns the same value as
f(x) and whose
gradient (as calculated by
tf.gradients) is determined by