Constructs symbolic derivatives of sum of `ys` w.r.t. x in `xs`.

`tf.gradients` is only valid in a graph context. In particular, it is valid in the context of a `tf.function` wrapper, where code is executing as a graph.

`ys` and `xs` are each a `Tensor` or a list of tensors. `grad_ys` is a list of `Tensor`, holding the gradients received by the `ys`. The list must be the same length as `ys`.

`gradients()` adds ops to the graph to output the derivatives of `ys` with respect to `xs`. It returns a list of `Tensor` of length `len(xs)` where each tensor is the `sum(dy/dx)` for y in `ys` and for x in `xs`.

`grad_ys` is a list of tensors of the same length as `ys` that holds the initial gradients for each y in `ys`. When `grad_ys` is None, we fill in a tensor of '1's of the shape of y for each y in `ys`. A user can provide their own initial `grad_ys` to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).

`stop_gradients` is a `Tensor` or a list of tensors to be considered constant with respect to all `xs`. These tensors will not be backpropagated through, as though they had been explicitly disconnected using `stop_gradient`. Among other things, this allows computation of partial derivatives as opposed to total derivatives. For example:

````@tf.function`
`def example():`
`  a = tf.constant(0.)`
`  b = 2 * a`
`  return tf.gradients(a + b, [a, b], stop_gradients=[a, b])`
`example()`
`[<tf.Tensor: shape=(), dtype=float32, numpy=1.0>,`
`<tf.Tensor: shape=(), dtype=float32, numpy=1.0>]`
```

Here the partial derivatives `g` evaluate to `[1.0, 1.0]`, compared to the total derivatives `tf.gradients(a + b, [a, b])`, which take into account the influence of `a` on `b` and evaluate to `[3.0, 1.0]`. Note that the above is equivalent to:

````@tf.function`
`def example():`
`  a = tf.stop_gradient(tf.constant(0.))`
`  b = tf.stop_gradient(2 * a)`
`  return tf.gradients(a + b, [a, b])`
`example()`
`[<tf.Tensor: shape=(), dtype=float32, numpy=1.0>,`
`<tf.Tensor: shape=(), dtype=float32, numpy=1.0>]`
```

`stop_gradients` provides a way of stopping gradient after the graph has already been constructed, as compared to `tf.stop_gradient` which is used during graph construction. When the two approaches are combined, backpropagation stops at both `tf.stop_gradient` nodes and nodes in `stop_gradients`, whichever is encountered first.

All integer tensors are considered constant with respect to all `xs`, as if they were included in `stop_gradients`.

`unconnected_gradients` determines the value returned for each x in xs if it is unconnected in the graph to ys. By default this is None to safeguard against errors. Mathematically these gradients are zero which can be requested using the `'zero'` option. `tf.UnconnectedGradients` provides the following options and behaviors:

````@tf.function`
`def example(use_zero):`
`  a = tf.ones([1, 2])`
`  b = tf.ones([3, 1])`
`  if use_zero:`
`    return tf.gradients([b], [a], unconnected_gradients='zero')`
`  else:`
`    return tf.gradients([b], [a], unconnected_gradients='none')`
`example(False)`
`[None]`
`example(True)`
`[<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0., 0.]], ...)>]`
```

Let us take one practical example which comes during the back propogation phase. This function is used to evaluate the derivatives of the cost function with respect to Weights `Ws` and Biases `bs`. Below sample implementation provides the exaplantion of what it is actually used for :

````@tf.function`
`def example():`
`  Ws = tf.constant(0.)`
`  bs = 2 * Ws`
`  cost = Ws + bs  # This is just an example. Please ignore the formulas.`
`  g = tf.gradients(cost, [Ws, bs])`
`  dCost_dW, dCost_db = g`
`  return dCost_dW, dCost_db`
`example()`
`(<tf.Tensor: shape=(), dtype=float32, numpy=3.0>,`
`<tf.Tensor: shape=(), dtype=float32, numpy=1.0>)`
```

`ys` A `Tensor` or list of tensors to be differentiated.
`xs` A `Tensor` or list of tensors to be used for differentiation.
`grad_ys` Optional. A `Tensor` or list of tensors the same size as `ys` and holding the gradients computed for each y in `ys`.
`name` Optional name to use for grouping all the gradient ops together. defaults to 'gradients'.
`gate_gradients` If True, add a tuple around the gradients returned for an operations. This avoids some race conditions.
`aggregation_method` Specifies the method used to combine gradient terms. Accepted values are constants defined in the class `AggregationMethod`.
`stop_gradients` Optional. A `Tensor` or list of tensors not to differentiate through.
`unconnected_gradients` Optional. Specifies the gradient value returned when the given input tensors are unconnected. Accepted values are constants defined in the class `tf.UnconnectedGradients` and the default value is `none`.

A list of `Tensor` of length `len(xs)` where each tensor is the `sum(dy/dx)` for y in `ys` and for x in `xs`.

`LookupError` if one of the operations between `x` and `y` does not have a registered gradient function.
`ValueError` if the arguments are invalid.
`RuntimeError` if called in Eager mode.

[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]