|TensorFlow 1 version||View source on GitHub|
Record operations for automatic differentiation.
See Migration guide for more details.
tf.GradientTape( persistent=False, watch_accessed_variables=True )
Used in the notebooks
|Used in the guide||Used in the tutorials|
Operations are recorded if they are executed within this context manager and at least one of their inputs is being "watched".
Trainable variables (created by
trainable=True is default in both cases) are automatically watched.
Tensors can be manually watched by invoking the
watch method on this context
For example, consider the function
y = x * x. The gradient at
x = 3.0 can
be computed as:
x = tf.constant(3.0) with tf.GradientTape() as g: g.watch(x) y = x * x dy_dx = g.gradient(y, x) # Will compute to 6.0
GradientTapes can be nested to compute higher-order derivatives. For example,
x = tf.constant(3.0) with tf.GradientTape() as g: g.watch(x) with tf.GradientTape() as gg: gg.watch(x) y = x * x dy_dx = gg.gradient(y, x) # Will compute to 6.0 d2y_dx2 = g.gradient(dy_dx, x) # Will compute to 2.0
By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected. For example:
x = tf.constant(3.0) with tf.GradientTape(persistent=True) as g: g.watch(x) y = x * x z = y * y dz_dx = g.gradient(z, x) # 108.0 (4*x^3 at x = 3) dy_dx = g.gradient(y, x) # 6.0 del g # Drop the reference to the tape
By default GradientTape will automatically watch any trainable variables that
are accessed inside the context. If you want fine grained control over which
variables are watched you can disable automatic tracking by passing
watch_accessed_variables=False to the tape constructor:
with tf.GradientTape(watch_accessed_variables=False) as tape: tape.watch(variable_a) y = variable_a ** 2 # Gradients will be available for `variable_a`. z = variable_b ** 3 # No gradients will be available since `variable_b` is # not being watched.
Note that when using models you should ensure that your variables exist when
watch_accessed_variables=False. Otherwise it's quite easy to make your
first iteration not have any gradients:
a = tf.keras.layers.Dense(32) b = tf.keras.layers.Dense(32) with tf.GradientTape(watch_accessed_variables=False) as tape: tape.watch(a.variables) # Since `a.build` has not been called at this point # `a.variables` will return an empty list and the # tape will not be watching anything. result = b(a(inputs)) tape.gradient(result, a.variables) # The result of this computation will be # a list of `None`s since a's variables # are not being watched.
Note that only tensors with real or complex dtypes are differentiable.
||Boolean controlling whether a persistent gradient tape is created. False by default, which means at most one call can be made to the gradient() method on this object.|
Boolean controlling whether the tape will
batch_jacobian( target, source, unconnected_gradients=tf.UnconnectedGradients.NONE, parallel_iterations=None, experimental_use_pfor=True )
Computes and stacks per-example jacobians.
See wikipedia article for the definition of a Jacobian. This function is essentially an efficient implementation of the following:
tf.stack([self.jacobian(y[i], x[i]) for i in range(x.shape)]).
Note that compared to
GradientTape.jacobian which computes gradient of
each output value w.r.t each input value, this function is useful when
target[i,...] is independent of
j != i. This
assumption allows more efficient computation as compared to
GradientTape.jacobian. The output, as well as intermediate activations,
are lower dimensional and avoid a bunch of redund