tf_agents.trajectories.trajectory.from_episode

Create a Trajectory from tensors representing a single episode.

If none of the inputs are tensors, then numpy arrays are generated instead.

If discount is not provided, the first entry in reward is used to estimate T:

reward_0 = tf.nest.flatten(reward)[0]
T = shape(reward_0)[0]

In this case, a discount of all ones having dtype float32 is generated.

observation (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
action (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
policy_info (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
reward (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
discount A floating point vector Tensor or np.ndarray; shaped [T] (optional).

An instance of Trajectory.