tf_agents.utils.common.compute_returns

Compute the return from each index in an episode.

rewards Tensor [T], [B, T], [T, B] of per-timestep reward.
discounts Tensor [T], [B, T], [T, B] of per-timestep discount factor. Should be 0. for final step of each episode.
time_major Bool, when batched inputs setting it to True, inputs are expected to be time-major: [T, B] otherwise, batch-major: [B, T].

Tensor of per-timestep cumulative returns.