A tuple that represents a trajectory.

A Trajectory represents a sequence of aligned time steps. It captures the observation, step_type from current time step with the computed action and policy_info. Discount, reward and next_step_type come from the next time step.

step_type A StepType.
observation An array (tensor), or a nested dict, list or tuple of arrays (tensors) that represents the observation.
action An array/a tensor, or a nested dict, list or tuple of actions. This represents action generated according to the observation.
policy_info An arbitrary nest that contains auxiliary information related to the action. Note that this does not include the policy/RNN state which was used to generate the action.
next_step_type The StepType of the next time step.
reward An array/a tensor, or a nested dict, list, or tuple of rewards. This represents the rewards and/or constraint satisfiability after performing the action in an environment.
discount A scalar that representing the discount factor to multiply with future rewards.



Exposes as namedtuple._replace.


  new_trajectory = trajectory.replace(policy_info=())

This returns a new trajectory with an empty policy_info.

**kwargs key/value pairs of fields in the trajectory.

A new Trajectory.