Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings


View source on GitHub

Returns a TimeStep with step_type set equal to StepType.MID.

    observation, reward, discount=1.0

Used in the notebooks

Used in the tutorials

For TF transitions, the batch size is inferred from the shape of reward.

If discount is a scalar, and observation contains Tensors, then discount will be broadcasted to match reward.shape.


  • observation: A NumPy array, tensor, or a nested dict, list or tuple of arrays or tensors.
  • reward: A scalar, or 1D NumPy array, or tensor.
  • discount: (optional) A scalar, or 1D NumPy array, or tensor.


A TimeStep.


  • ValueError: If observations are tensors but reward's statically known rank is not 0 or 1.