tf_agents.utils.value_ops.generalized_advantage_estimation
Stay organized with collections
Save and categorize content based on your preferences.
Computes generalized advantage estimation (GAE).
tf_agents.utils.value_ops.generalized_advantage_estimation(
values, final_value, discounts, rewards, td_lambda=1.0, time_major=True
)
For theory, see
"High-Dimensional Continuous Control Using Generalized Advantage Estimation"
by John Schulman, Philipp Moritz et al.
See https://arxiv.org/abs/1506.02438 for full paper.
Define abbreviations |
(B) batch size representing number of trajectories
(T) number of steps per trajectory
|
Args |
values
|
Tensor with shape [T, B] representing value estimates.
|
final_value
|
Tensor with shape [B] representing value estimate at t=T.
|
discounts
|
Tensor with shape [T, B] representing discounts received by
following the behavior policy.
|
rewards
|
Tensor with shape [T, B] representing rewards received by
following the behavior policy.
|
td_lambda
|
A float32 scalar between [0, 1]. It's used for variance reduction
in temporal difference.
|
time_major
|
A boolean indicating whether input tensors are time major. False
means input tensors have shape [B, T] .
|
Returns |
A tensor with shape [T, B] representing advantages. Shape is [B, T] when
not time_major .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[{
"type": "thumb-down",
"id": "missingTheInformationINeed",
"label":"Missing the information I need"
},{
"type": "thumb-down",
"id": "tooComplicatedTooManySteps",
"label":"Too complicated / too many steps"
},{
"type": "thumb-down",
"id": "outOfDate",
"label":"Out of date"
},{
"type": "thumb-down",
"id": "samplesCodeIssue",
"label":"Samples / code issue"
},{
"type": "thumb-down",
"id": "otherDown",
"label":"Other"
}]
[{
"type": "thumb-up",
"id": "easyToUnderstand",
"label":"Easy to understand"
},{
"type": "thumb-up",
"id": "solvedMyProblem",
"label":"Solved my problem"
},{
"type": "thumb-up",
"id": "otherUp",
"label":"Other"
}]
{"lastModified": "Last updated 2024-04-26 UTC."}
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]