tf_agents.policies.SavedModelPyTFEagerPolicy

Exposes a numpy API for saved_model policies in Eager mode.

Inherits From: PyTFEagerPolicyBase, PyPolicy

Used in the notebooks

Used in the tutorials

model_path Path to a saved_model generated by the policy_saver.
time_step_spec Optional nested structure of ArraySpecs describing the policy's time_step_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
action_spec Optional nested structure of ArraySpecs describing the policy's action_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
policy_state_spec Optional nested structure of ArraySpecs describing the policy's policy_state_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
info_spec Optional nested structure of ArraySpecs describing the policy's info_spec. This is not used by the SavedModelPyTFEagerPolicy, but may be accessed by other objects as it is part of the public policy API.
load_specs_from_pbtxt If True the specs will be loaded from the proto file generated by the policy_saver.
use_tf_function See PyTFEagerPolicyBase.
batch_time_steps See PyTFEagerPolicyBase.

action_spec Describes the ArraySpecs of the np.Array returned by action().

action can be a single np.Array, or a nested dict, list or tuple of np.Array.

collect_data_spec Describes the data collected when using this policy with an environment.
info_spec Describes the Arrays emitted as info by action().
observation_and_action_constraint_splitter

policy_state_spec Describes the arrays expected by functions with policy_state as input.
policy_step_spec Describes the output of action().
time_step_spec Describes the TimeStep np.Arrays expected by action(time_step).
trajectory_spec Describes the data collected when using this policy with an environment.

Methods

action

View source

Generates next action given the time_step and policy_state.

Args
time_step A TimeStep tuple corresponding to time_step_spec().
policy_state An optional previous policy_state.
seed Seed to use if action uses sampling (optional).

Returns
A PolicyStep named tuple containing: action: A nest of action Arrays matching the action_spec(). state: A nest of policy states to be fed into the next call to action. info: Optional side information such as action log probabilities.

get_initial_state

View source

Returns an initial state usable by the policy.

Args
batch_size An optional batch size.

Returns
An initial policy state.

get_metadata

View source

Returns the metadata of the saved model.

get_train_step

View source

Returns the training global step of the saved model.

get_train_step_from_last_restored_checkpoint_path

View source

Returns the training step of the restored checkpoint.

update_from_checkpoint

View source

Allows users to update saved_model variables directly from a checkpoint.

checkpoint_path is a path that was passed to either PolicySaver.save() or PolicySaver.save_checkpoint(). The policy looks for set of checkpoint files with the file prefix `/variables/variables'

Args
checkpoint_path Path to the checkpoint to restore and use to udpate this policy.

variables

View source