Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting

View source on GitHub

Update the covariance matrix a and the weighted sum of rewards b.

tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting(
    a_prev, b_prev, r, x, gamma, compute_eigendecomp=False
)

This function updates the covariance matrix a and the sum of weighted rewards b using a forgetting factor gamma.

Args:

  • a_prev: previous estimate of a.
  • b_prev: previous estimate of b.
  • r: a Tensor of shape [batch_size]. This is the rewards of the batched observations.
  • x: a Tensor of shape [batch_size, context_dim]. This is the matrix with the (batched) observations.
  • gamma: a float forgetting factor in [0.0, 1.0].
  • compute_eigendecomp: whether to compute the eigen-decomposition of the new covariance matrix.

Returns:

The updated estimates of a and b and optionally the eigenvalues and eigenvectors of a.