tf.autodiff.ForwardAccumulator

Computes Jacobian-vector products ("JVP"s) using forward-mode autodiff.

Compare to tf.GradientTape which computes vector-Jacobian products ("VJP"s) using reverse-mode autodiff (backprop). Reverse mode is more attractive when computing gradients of a scalar-valued function with respect to many inputs (e.g. a neural network with many parameters and a scalar loss). Forward mode works best on functions with many outputs and few inputs. Since it does not hold on to intermediate activations, it is much more memory efficient than backprop where it is applicable.

Consider a simple linear regression:

x = tf.constant([[2.0, 3.0], [1.0, 4.0]])
targets = tf.constant([[1.], [-1.]])
dense = tf.keras.layers.Dense(1)
dense.build([None, 2])
with tf.autodiff.ForwardAccumulator(
   primals=dense.kernel,
   tangents=tf.constant([[1.], [0.]])) as acc:
  loss = tf.reduce_sum((dense(x) - targets) ** 2.)
acc.jvp(loss)
<tf.Tensor: shape=(), dtype=float32, numpy=...>

The example has two variables containing parameters, dense.kernel (2 parameters) and dense.bias (1 parameter). Considering the training data x as a constant, this means the Jacobian matrix for the function mapping from parameters to loss has one row and