View source on GitHub

Returns a TimeStep with step_type set to StepType.LAST.

Used in the notebooks

Used in the tutorials

observation A NumPy array, tensor, or a nested dict, list or tuple of arrays or tensors.
reward A scalar, or 1D NumPy array, or tensor.

A TimeStep.

ValueError If observations are tensors but reward's statically known rank is not 0 or 1.