|View source on GitHub|
Exposes a Python policy as wrapper over a TF Policy.
tf_agents.policies.py_tf_policy.PyTFPolicy( policy, batch_size=None, seed=None )
policy: A TF Policy implementing
seed: Seed to use if policy performs random actions (optional).
action_spec: Describes the ArraySpecs of the np.Array returned by
actioncan be a single np.Array, or a nested dict, list or tuple of np.Array.
info_spec: Describes the Arrays emitted as info by
policy_state_spec: Describes the arrays expected by functions with
policy_step_spec: Describes the output of
session: Returns the TensorFlow session-like object used by this object.
time_step_spec: Describes the
TimeStepnp.Arrays expected by
trajectory_spec: Describes the data collected when using this policy with an environment.
action( time_step, policy_state=() )
Generates next action given the time_step and policy_state.
TimeSteptuple corresponding to
policy_state: An optional previous policy_state.
A PolicyStep named tuple containing:
action: A nest of action Arrays matching the
state: A nest of policy states to be fed into the next call to action.
info: Optional side information such as action log probabilities.
get_initial_state( batch_size=None )
Returns an initial state usable by the policy.
batch_size: An optional batch size.
An initial policy state.
initialize( batch_size, graph=None )
restore( policy_dir, graph=None, assert_consumed=True )
Restores the policy from the checkpoint.
policy_dir: Directory with the checkpoint.
graph: A graph, inside which policy the is restored (optional).
assert_consumed: If true, contents of the checkpoint will be checked for a match against graph variables.
step: Global step associated with the restored policy checkpoint.
RuntimeError: if the policy is not initialized.
AssertionError: if the checkpoint contains variables which do not have matching names in the graph, and assert_consumed is set to True.
save( policy_dir=None, graph=None )