|View source on GitHub|
A tuple that represents a trajectory.
tf_agents.trajectories.Trajectory( step_type, observation, action, policy_info, next_step_type, reward, discount )
Used in the notebooks
|Used in the tutorials|
Trajectory represents a sequence of aligned time steps. It captures the
observation, step_type from current time step with the computed action
and policy_info. Discount, reward and next_step_type come from the next
||An array (tensor), or a nested dict, list or tuple of arrays (tensors) that represents the observation.|
||An array/a tensor, or a nested dict, list or tuple of actions. This represents action generated according to the observation.|
||An arbitrary nest that contains auxiliary information related to the action. Note that this does not include the policy/RNN state which was used to generate the action.|
||An array/a tensor, or a nested dict, list, or tuple of rewards. This represents the rewards and/or constraint satisfiability after performing the action in an environment.|
||A scalar that representing the discount factor to multiply with future rewards.|
replace( **kwargs ) -> "Trajectory"
Exposes as namedtuple._replace.
new_trajectory = trajectory.replace(policy_info=())
This returns a new trajectory with an empty policy_info.
||key/value pairs of fields in the trajectory.|