tf_agents.utils.common.compute_returns

View source on GitHub

Compute the return from each index in an episode.

rewards Tensor of per-timestep reward in the episode.
discounts Tensor of per-timestep discount factor. Should be 0 for final step of each episode.

Tensor of per-timestep cumulative returns.