Returned with every call to policy.action() and policy.distribution().

Used in the notebooks

Used in the tutorials

action An action tensor or action distribution for TFPolicy, or numpy array for PyPolicy.
state State of the policy to be fed back into the next call to policy.action() or policy.distribution(), e.g. an RNN state. For stateless policies, this will be an empty tuple.
info Auxiliary information emitted by the policy, e.g. log probabilities of the actions. For policies without info this will be an empty tuple.



Exposes as namedtuple._replace.


  new_policy_step = policy_step.replace(action=())

This returns a new policy step with an empty action.

**kwargs key/value pairs of fields in the policy step.

A new PolicyStep.