Returns actions from the given configuration.

Inherits From: PyPolicy

Used in the notebooks

Used in the tutorials

time_step_spec A time_step_spec for the policy will interact with.
action_spec An action_spec for the environment the policy will interact with.
action_script A list of 2-tuples of the form (n, nest) where the nest of actions follow the action_spec. Each action will be executed for n steps.

action_spec Describes the ArraySpecs of the np.Array returned by action().

action can be a single np.Array, or a nested dict, list or tuple of np.Array.

collect_data_spec Describes the data collected when using this policy with an environment.
info_spec Describes the Arrays emitted as info by action().

policy_state_spec Describes the arrays expected by functions with policy_state as input.
policy_step_spec Describes the output of action().
time_step_spec Describes the TimeStep np.Arrays expected by action(time_step).
trajectory_spec Describes the data collected when using this policy with an environment.



View source

Generates next action given the time_step and policy_state.

time_step A TimeStep tuple corresponding to time_step_spec().
policy_state An optional previous policy_state.
seed Seed to use if action uses sampling (optional).

A PolicyStep named tuple containing: action: A nest of action Arrays matching the action_spec(). state: A nest of policy states to be fed into the next call to action. info: Optional side information such as action log probabilities.


View source

Returns an initial state usable by the policy.

batch_size An optional batch size.

An initial policy state.