TensorFlow is back at Google I/O on May 14! Register now

tf_agents.utils.value_ops.generalized_advantage_estimation

Computes generalized advantage estimation (GAE).

tf_agents.utils.value_ops.generalized_advantage_estimation(
    values, final_value, discounts, rewards, td_lambda=1.0, time_major=True
)

For theory, see "High-Dimensional Continuous Control Using Generalized Advantage Estimation" by John Schulman, Philipp Moritz et al. See https://arxiv.org/abs/1506.02438 for full paper.

Define abbreviations
(B) batch size representing number of trajectories (T) number of steps per trajectory

Args
`values`	Tensor with shape `[T, B]` representing value estimates.
`final_value`	Tensor with shape `[B]` representing value estimate at t=T.
`discounts`	Tensor with shape `[T, B]` representing discounts received by following the behavior policy.
`rewards`	Tensor with shape `[T, B]` representing rewards received by following the behavior policy.
`td_lambda`	A float32 scalar between [0, 1]. It's used for variance reduction in temporal difference.
`time_major`	A boolean indicating whether input tensors are time major. False means input tensors have shape `[B, T]`.

Returns
A tensor with shape `[T, B]` representing advantages. Shape is `[B, T]` when `not time_major`.

tf_agents.utils.value_ops.generalized_advantage_estimation

Define abbreviations

Args

Returns