tf_agents.bandits.agents.linear_bandit_agent.update_a_and_b_with_forgetting

View source on GitHub

Update the covariance matrix a and the weighted sum of rewards b.

This function updates the covariance matrix a and the sum of weighted rewards b using a forgetting factor gamma.

a_prev previous estimate of a.
b_prev previous estimate of b.
r a Tensor of shape [batch_size]. This is the rewards of the batched observations.
x a Tensor of shape [batch_size, context_dim]. This is the matrix with the (batched) observations.
gamma a float forgetting factor in [0.0, 1.0].
compute_eigendecomp whether to compute the eigen-decomposition of the new covariance matrix.

The updated estimates of a and b and optionally the eigenvalues and eigenvectors of a.