tf_agents.utils.common.discounted_future_sum

Discounted future sum of batch-major values.

values A Tensor of shape [batch_size, total_steps] and dtype float32.
gamma A float discount value.
num_steps A positive integer number of future steps to sum.

A Tensor of shape [batch_size, total_steps], where each entry (i, j) is the result of summing the entries of values starting from gamma^0 * values[i, j] to gamma^(num_steps - 1) * values[i, j + num_steps - 1], with zeros padded to values.

For example, values=[5, 6, 7], gamma=0.9, will result in sequence:

[(5 * 0.9^0 + 6 * 0.9^1 + 7 * 0.9^2), (6 * 0.9^0 + 7 * 0.9^1), 7 * 0.9^0]

ValueError If values is not of rank 2.