Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.trajectories.trajectory.from_episode

View source on GitHub

Create a Trajectory from tensors representing a single episode.

tf_agents.trajectories.trajectory.from_episode(
    observation, action, policy_info, reward, discount=None
)

If none of the inputs are tensors, then numpy arrays are generated instead.

If discount is not provided, the first entry in reward is used to estimate T:

reward_0 = tf.nest.flatten(reward)[0]
T = shape(reward_0)[0]

In this case, a discount of all ones having dtype float32 is generated.

NOTE: all tensors/numpy arrays passed to this function have the same time dimension T. When the generated trajectory passes through to_transition, it will only return a (time_steps, next_time_steps) pair with T - 1 in the time dimension, which means the reward at step T is dropped. So if the reward at step T is important, please make sure the episode passed to this function contains an additional step.

Args:

  • observation: (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
  • action: (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
  • policy_info: (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
  • reward: (possibly nested tuple of) Tensor or np.ndarray; all shaped [T, ...].
  • discount: A floating point vector Tensor or np.ndarray; shaped [T] (optional).

Returns:

An instance of Trajectory.