tf_agents.utils.value_ops.discounted_return

Computes discounted return.

tf_agents.utils.value_ops.discounted_return(
    rewards,
    discounts,
    final_value=None,
    time_major=True,
    provide_all_returns=True
)

Q_n = sum_{n'=n}^N gamma^(n'-n) * r_{n'} + gamma^(N-n+1)*final_value.

For details, see "Reinforcement Learning: An Introduction" Second Edition by Richard S. Sutton and Andrew G. Barto

B: batch size representing number of trajectories. T: number of steps per trajectory. This is equal to N - n in the equation above.

Args
`rewards`	Tensor with shape `[T, B]` (or `[T]`) representing rewards.
`discounts`	Tensor with shape `[T, B]` (or `[T]`) representing discounts.
`final_value`	(Optional.). Default: An all zeros tensor. Tensor with shape `[B]` (or `[1]`) representing value estimate at `T`. This is optional; when set, it allows final value to bootstrap the reward computation.
`time_major`	A boolean indicating whether input tensors are time major. False means input tensors have shape `[B, T]`.
`provide_all_returns`	A boolean; if True, this will provide all of the returns by time dimension; if False, this will only give the single complete discounted return.

Returns
If `provide_all_returns`: A tensor with shape `[T, B]` (or `[T]`) representing the discounted returns. The shape is `[B, T]` when `not time_major`. If `not provide_all_returns`: A tensor with shape `[B]` (or []) representing the discounted returns.

Args