tf.gradients( ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None, stop_gradients=None )
See the guide: Training > Gradient Computation
Constructs symbolic derivatives of sum of
ys w.r.t. x in
xs are each a
Tensor or a list of tensors.
is a list of
Tensor, holding the gradients received by the
ys. The list must be the same length as
gradients() adds ops to the graph to output the derivatives of
xs. It returns a list of
Tensor of length
each tensor is the
sum(dy/dx) for y in
grad_ys is a list of tensors of the same length as
ys that holds
the initial gradients for each y in
grad_ys is None,
we fill in a tensor of '1's of the shape of y for each y in
user can provide their own initial
grad_ys to compute the
derivatives using a different initial gradient for each y (e.g., if
one wanted to weight the gradient differently for each value in
stop_gradients is a
Tensor or a list of tensors to be considered constant
with respect to all
xs. These tensors will not be backpropagated through,
as though they had been explicitly disconnected using
other things, this allows computation of partial derivatives as opposed to
total derivatives. For example:
a = tf.constant(0.) b = 2 * a g = tf.gradients(a + b, [a, b], stop_gradients=[a, b])
Here the partial derivatives
g evaluate to
[1.0, 1.0], compared to the
tf.gradients(a + b, [a, b]), which take into account the
b and evaluate to
[3.0, 1.0]. Note that the above is
a = tf.stop_gradient(tf.constant(0.)) b = tf.stop_gradient(2 * a) g = tf.gradients(a + b, [a, b])
stop_gradients provides a way of stopping gradient after the graph has
already been constructed, as compared to
tf.stop_gradient which is used
during graph construction. When the two approaches are combined,
backpropagation stops at both
tf.stop_gradient nodes and nodes in
stop_gradients, whichever is encountered first.
All integer tensors are considered constant with respect to all
xs, as if
they were included in
Tensoror list of tensors to be differentiated.
Tensoror list of tensors to be used for differentiation.
grad_ys: Optional. A
Tensoror list of tensors the same size as
ysand holding the gradients computed for each y in
name: Optional name to use for grouping all the gradient ops together. defaults to 'gradients'.
colocate_gradients_with_ops: If True, try colocating gradients with the corresponding op.
gate_gradients: If True, add a tuple around the gradients returned for an operations. This avoids some race conditions.
aggregation_method: Specifies the method used to combine gradient terms. Accepted values are constants defined in the class
stop_gradients: Optional. A
Tensoror list of tensors not to differentiate through.
A list of
sum(dy/dx) for each x in
LookupError: if one of the operations between
ydoes not have a registered gradient function.
ValueError: if the arguments are invalid.
RuntimeError: if called in Eager mode.