Returned with every call to policy.action() and policy.distribution().

Used in the notebooks

Used in the tutorials

action An action tensor or action distribution for TFPolicy, or numpy array for PyPolicy.
state During inference, it will hold the state of the policy to be fed back into the next call to policy.action() or policy.distribution(), e.g. an RNN state. During the training, it will hold the state that is input to policy.action() or policy.distribution() For stateless policies, this will be an empty tuple.
info Auxiliary information emitted by the policy, e.g. log probabilities of the actions. For policies without info this will be an empty tuple.



View source

Exposes as namedtuple._replace.


  new_policy_step = policy_step.replace(action=())

This returns a new policy step with an empty action.

**kwargs key/value pairs of fields in the policy step.

A new PolicyStep.