A nest of BoundedTensorSpec representing the actions.
An instance of a tf_agents.network.Network,
callable via network(observation, step_type) -> (output, final_state).
A function used for masking
valid/invalid actions with each state of the environment. The function
takes in a full observation and returns a tuple consisting of 1) the
part of the observation intended as input to the network and 2) the
mask. The mask should be a 0-1 Tensor of shape
[batch_size, num_actions]. This function should also work with a
TensorSpec as input, and should output TensorSpec objects for the
observation and mask.
(tuple of strings) what side information we want to get
as part of the policy info. Allowed values can be found in
The name of this policy. All variables in this module will fall
under that name. Defaults to the class name.
If action_spec contains more than one
BoundedTensorSpec or the BoundedTensorSpec is not valid.
Describes the TensorSpecs of the Tensors expected by step(action).
action can be a single Tensor, or a nested dict, list or tuple of
Describes the Tensors written when using this policy with an environment.
Whether this policy instance emits log probabilities or not.
Describes the Tensors emitted as info by action and distribution.
info can be an empty tuple, a single Tensor, or a nested dict,
list or tuple of Tensors.
Returns the name of this module as passed or determined in the ctor.
Generates next action given the time_step and policy_state.
A TimeStep tuple corresponding to time_step_spec().
A Tensor, or a nested dict, list or tuple of Tensors
representing the previous policy_state.
Seed to use if action performs sampling (optional).
A PolicyStep named tuple containing:
action: An action Tensor matching the action_spec().
state: A policy state tensor to be fed into the next call to action.
info: Optional side information such as action log probabilities.