tf_agents.trajectories.trajectory.first

View source on GitHub

Create a Trajectory transitioning between StepTypes FIRST and MID.

All inputs may be batched.

The input discount is used to infer the outer shape of the inputs, as it is always expected to be a singleton array with scalar inner shape.

observation (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
action (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
policy_info (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
reward (possibly nested tuple of) Tensor or np.ndarray; all shaped [B, ...], [T, ...], or [B, T, ...].
discount A floating point vector Tensor or np.ndarray; shaped [B], [T], or [B, T] (optional).

A Trajectory instance.