Help protect the Great Barrier Reef with TensorFlow on Kaggle

Record operations for automatic differentiation.

### Used in the notebooks

Operations are recorded if they are executed within this context manager and at least one of their inputs is being "watched".

Trainable variables (created by `tf.Variable` or `tf.compat.v1.get_variable`, where `trainable=True` is default in both cases) are automatically watched. Tensors can be manually watched by invoking the `watch` method on this context manager.

For example, consider the function `y = x * x`. The gradient at `x = 3.0` can be computed as:

````x = tf.constant(3.0)`
`with tf.GradientTape() as g:`
`  g.watch(x)`
`  y = x * x`
`dy_dx = g.gradient(y, x)`
`print(dy_dx)`
`tf.Tensor(6.0, shape=(), dtype=float32)`
```

GradientTapes can be nested to compute higher-order derivatives. For example,

````x = tf.constant(5.0)`
`with tf.GradientTape() as g:`
`  g.watch(x)`
`  with tf.GradientTape() as gg:`
`    gg.watch(x)`
`    y = x * x`
`  dy_dx = gg.gradient(y, x)  # dy_dx = 2 * x`
`d2y_dx2 = g.gradient(dy_dx, x)  # d2y_dx2 = 2`
`print(dy_dx)`
`tf.Tensor(10.0, shape=(), dtype=float32)`
`print(d2y_dx2)`
`tf.Tensor(2.0, shape=(), dtype=float32)`
```

By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected. For example:

````x = tf.constant(3.0)`
`with tf.GradientTape(persistent=True) as g:`
`  g.watch(x)`
`  y = x * x`
`  z = y * y`
`dz_dx = g.gradient(z, x)  # (4*x^3 at x = 3)`
`print(dz_dx)`
`tf.Tensor(108.0, shape=(), dtype=float32)`
`dy_dx = g.gradient(y, x)`
`print(dy_dx)`
`tf.Tensor(6.0, shape=(), dtype=float32)`
```

By default GradientTape will automatically watch any trainable variables that are accessed inside the context. If you wa