|View source on GitHub|
Returns random samples of the given action_spec.
tf_agents.policies.random_py_policy.RandomPyPolicy( time_step_spec, action_spec, seed=None, outer_dims=None, observation_and_action_constraint_splitter=None )
Used in the notebooks
|Used in the tutorials|
time_step_spec. If not None and outer_dims is not provided this is used to infer the outer_dims required for the given time_step when action is called.
action_spec: A nest of BoundedArraySpec representing the actions to sample from.
seed: Optional seed used to instantiate a random number generator.
outer_dims: An optional list/tuple specifying outer dimensions to add to the spec shape before sampling. If unspecified the outer_dims are derived from the outer_dims in the given observation when
observation_and_action_constraint_splitter: A function used to process observations with action constraints. These constraints can indicate, for example, a mask of valid/invalid actions for a given state of the environment. The function takes in a full observation and returns a tuple consisting of 1) the part of the observation intended as input to the network and 2) the constraint. An example
observation_and_action_constraint_splittercould be as simple as:
def observation_and_action_constraint_splitter(observation): return observation['network_input'], observation['constraint']
Note: when using
observation_and_action_constraint_splitter, make sure the provided
q_networkis compatible with the network-specific half of the output of the
observation_and_action_constraint_splitter. In particular,
observation_and_action_constraint_splitterwill be called on the observation before passing to the network. If
observation_and_action_constraint_splitteris None, action constraints are not applied.
action_spec: Describes the ArraySpecs of the np.Array returned by
actioncan be a single np.Array, or a nested dict, list or tuple of np.Array.
info_spec: Describes the Arrays emitted as info by
policy_state_spec: Describes the arrays expected by functions with
policy_step_spec: Describes the output of
time_step_spec: Describes the
TimeStepnp.Arrays expected by
trajectory_spec: Describes the data collected when using this policy with an environment.
action( time_step, policy_state=() )
Generates next action given the time_step and policy_state.
TimeSteptuple corresponding to
policy_state: An optional previous policy_state.
A PolicyStep named tuple containing:
action: A nest of action Arrays matching the
state: A nest of policy states to be fed into the next call to action.
info: Optional side information such as action log probabilities.
get_initial_state( batch_size=None )
Returns an initial state usable by the policy.
batch_size: An optional batch size.
An initial policy state.