tf_agents.trajectories.trajectory.from_episode

Create a Trajectory from tensors representing a single episode.

tf_agents.trajectories.trajectory.from_episode(
    observation: tf_agents.typing.types.NestedSpecTensorOrArray,
    action: tf_agents.typing.types.NestedSpecTensorOrArray,
    policy_info: tf_agents.typing.types.NestedSpecTensorOrArray,
    reward: tf_agents.typing.types.NestedSpecTensorOrArray,
    discount: Optional[types.SpecTensorOrArray] = None
) -> tf_agents.trajectories.Trajectory

If none of the inputs are tensors, then numpy arrays are generated instead.

If discount is not provided, the first entry in reward is used to estimate T:

reward_0 = tf.nest.flatten(reward)[0]
T = shape(reward_0)[0]

In this case, a discount of all ones having dtype float32 is generated.

Args
`observation`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[T, ...]`.
`action`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[T, ...]`.
`policy_info`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[T, ...]`.
`reward`	(possibly nested tuple of) `Tensor` or `np.ndarray`; all shaped `[T, ...]`.
`discount`	A floating point vector `Tensor` or `np.ndarray`; shaped `[T]` (optional).

Returns
An instance of `Trajectory`.

tf_agents.trajectories.trajectory.from_episode

Args

Returns