Returned with every call to step and reset on an environment.

A TimeStep contains the data emitted by an environment at each step of interaction. A TimeStep holds a step_type, an observation (typically a NumPy array or a dict or list of arrays), and an associated reward and discount.

The first TimeStep in a sequence will equal StepType.FIRST. The final TimeStep will equal StepType.LAST. All other TimeSteps in a sequence will equal `StepType.MID.

step_type a Tensor or array of StepType enum values.
reward a Tensor or array of reward values.
discount A discount value in the range [0, 1].
observation A NumPy array, or a nested dict, list or tuple of arrays.



View source


View source


View source