tf.raw_ops.StopGradient
bookmark_borderbookmark
Stay organized with collections
Save and categorize content based on your preferences.
Stops gradient computation.
View aliases
Compat aliases for migration
See
Migration guide for
more details.
tf.compat.v1.raw_ops.StopGradient
tf.raw_ops.StopGradient(
input, name=None
)
When executed in a graph, this op outputs its input tensor as-is.
When building ops to compute gradients, this op prevents the contribution of
its inputs to be taken into account. Normally, the gradient generator adds ops
to a graph to compute the derivatives of a specified 'loss' by recursively
finding out inputs that contributed to its computation. If you insert this op
in the graph it inputs are masked from the gradient generator. They are not
taken into account for computing gradients.
This is useful any time you want to compute a value with TensorFlow but need
to pretend that the value was a constant. For example, the softmax function
for a vector x can be written as
def softmax(x):
numerator = tf.exp(x)
denominator = tf.reduce_sum(numerator)
return numerator / denominator
This however is susceptible to overflow if the values in x are large. An
alternative more stable way is to subtract the maximum of x from each of the
values.
def stable_softmax(x):
z = x - tf.reduce_max(x)
numerator = tf.exp(z)
denominator = tf.reduce_sum(numerator)
return numerator / denominator
However, when we backprop through the softmax to x, we dont want to backprop
through the tf.reduce_max(x)
(if the max values are not unique then the
gradient could flow to the wrong input) calculation and treat that as a
constant. Therefore, we should write this out as
def stable_softmax(x):
z = x - tf.stop_gradient(tf.reduce_max(x))
numerator = tf.exp(z)
denominator = tf.reduce_sum(numerator)
return numerator / denominator
Some other examples include:
- The EM algorithm where the M-step should not involve backpropagation
through the output of the E-step.
- Contrastive divergence training of Boltzmann machines where, when
differentiating the energy function, the training must not backpropagate
through the graph that generated the samples from the model.
- Adversarial training, where no backprop should happen through the adversarial
example generation process.
Args |
input
|
A Tensor .
|
name
|
A name for the operation (optional).
|
Returns |
A Tensor . Has the same type as input .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.raw_ops.StopGradient\n\n\u003cbr /\u003e\n\nStops gradient computation.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.raw_ops.StopGradient`](https://www.tensorflow.org/api_docs/python/tf/raw_ops/StopGradient)\n\n\u003cbr /\u003e\n\n tf.raw_ops.StopGradient(\n input, name=None\n )\n\nWhen executed in a graph, this op outputs its input tensor as-is.\n\nWhen building ops to compute gradients, this op prevents the contribution of\nits inputs to be taken into account. Normally, the gradient generator adds ops\nto a graph to compute the derivatives of a specified 'loss' by recursively\nfinding out inputs that contributed to its computation. If you insert this op\nin the graph it inputs are masked from the gradient generator. They are not\ntaken into account for computing gradients.\n\nThis is useful any time you want to compute a value with TensorFlow but need\nto pretend that the value was a constant. For example, the softmax function\nfor a vector x can be written as \n\n\n def softmax(x):\n numerator = tf.exp(x)\n denominator = tf.reduce_sum(numerator)\n return numerator / denominator\n\nThis however is susceptible to overflow if the values in x are large. An\nalternative more stable way is to subtract the maximum of x from each of the\nvalues. \n\n\n def stable_softmax(x):\n z = x - tf.reduce_max(x)\n numerator = tf.exp(z)\n denominator = tf.reduce_sum(numerator)\n return numerator / denominator\n\nHowever, when we backprop through the softmax to x, we dont want to backprop\nthrough the [`tf.reduce_max(x)`](../../tf/math/reduce_max) (if the max values are not unique then the\ngradient could flow to the wrong input) calculation and treat that as a\nconstant. Therefore, we should write this out as \n\n\n def stable_softmax(x):\n z = x - tf.stop_gradient(tf.reduce_max(x))\n numerator = tf.exp(z)\n denominator = tf.reduce_sum(numerator)\n return numerator / denominator\n\nSome other examples include:\n\n- The *EM* algorithm where the *M-step* should not involve backpropagation through the output of the *E-step*.\n- Contrastive divergence training of Boltzmann machines where, when differentiating the energy function, the training must not backpropagate through the graph that generated the samples from the model.\n- Adversarial training, where no backprop should happen through the adversarial example generation process.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------|--------------------------------------|\n| `input` | A `Tensor`. |\n| `name` | A name for the operation (optional). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `Tensor`. Has the same type as `input`. ||\n\n\u003cbr /\u003e"]]