Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings


View source on GitHub

Update the covariance matrix a and the weighted sum of rewards b.

    a_prev, b_prev, r, x, gamma, compute_eigendecomp=False

This function updates the covariance matrix a and the sum of weighted rewards b using a forgetting factor gamma.


  • a_prev: previous estimate of a.
  • b_prev: previous estimate of b.
  • r: a Tensor of shape [batch_size]. This is the rewards of the batched observations.
  • x: a Tensor of shape [batch_size, context_dim]. This is the matrix with the (batched) observations.
  • gamma: a float forgetting factor in [0.0, 1.0].
  • compute_eigendecomp: whether to compute the eigen-decomposition of the new covariance matrix.


The updated estimates of a and b and optionally the eigenvalues and eigenvectors of a.