|View source on GitHub|
A tuple that represents a trajectory.
@staticmethod tf_agents.trajectories.trajectory.Trajectory( _cls, step_type, observation, action, policy_info, next_step_type, reward, discount )
Trajectory is a sequence of aligned time steps. It captures the
observation, step_type from current time step with the computed action
and policy_info. Discount, reward and next_step_type come from the next
observation: An array (tensor), or a nested dict, list or tuple of arrays (tensors) that represents the observation.
action: An array/a tensor, or a nested dict, list or tuple of actions. This represents action generated according to the observation.
policy_info: An arbitrary nest that contains auxiliary information related to the action. Note that this does not include the policy/RNN state which was used to generate the action.
StepTypeof the next time step.
reward: An array/a tensor, or a nested dict, list, or tuple of rewards. This represents the rewards and/or constraint satisfiability after performing the action in an environment.
discount: A scalar that representing the discount factor to multiply with future rewards.
replace( **kwargs )
Exposes as namedtuple._replace.
new_trajectory = trajectory.replace(policy_info=())
This returns a new trajectory with an empty policy_info.
**kwargs: key/value pairs of fields in the trajectory.