View source on GitHub |
Stationary Stochastic Bandit environment with per-arm features.
Inherits From: BanditPyEnvironment
, PyEnvironment
tf_agents.bandits.environments.stationary_stochastic_per_arm_py_environment.StationaryStochasticPerArmPyEnvironment(
global_context_sampling_fn: Callable[[], tf_agents.typing.types.Array
],
arm_context_sampling_fn: Callable[[], tf_agents.typing.types.Array
],
max_num_actions: int,
reward_fn: Callable[[tf_agents.typing.types.Array
], Sequence[float]],
num_actions_fn: Optional[Callable[[], int]] = None,
batch_size: Optional[int] = 1,
name: Optional[Text] = 'stationary_stochastic_per_arm'
)
Used in the notebooks
Used in the tutorials |
---|
Attributes | |
---|---|
batch_size
|
The batch size of the environment. |
name
|
Methods
action_spec
action_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the actions that should be provided to step()
.
May use a subclass of ArraySpec
that specifies additional properties such
as min and max bounds on the values.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
batched
batched() -> bool
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
When batched and handle_auto_reset, it checks np.all(steps.is_last())
.
Returns | |
---|---|
A boolean indicating whether the environment is batched or not. |
close
close() -> None
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method be used directly
env = Env(...)
# Use env.
env.close()
or via a context manager
with Env(...) as env:
# Use env.
current_time_step
current_time_step() -> tf_agents.trajectories.TimeStep
Returns the current timestep.
discount_spec
discount_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the discount that are returned by step()
.
Override this method to define an environment that uses non-standard discount values, for example an environment with array-valued discounts.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
get_info
get_info() -> tf_agents.typing.types.NestedArray
Returns the environment info returned on the last step.
Returns | |
---|---|
Info returned by last call to step(). None by default. |
Raises | |
---|---|
NotImplementedError
|
If the environment does not use info. |
get_state
get_state() -> Any
Returns the state
of the environment.
The state
contains everything required to restore the environment to the
current configuration. This can contain e.g.
- The current time_step.
- The number of steps taken in the environment (for finite horizon MDPs).
- Hidden state (for POMDPs).
Callers should not assume anything about the contents or format of the
returned state
. It should be treated as a token that can be passed back to
set_state()
later.
Note that the returned state
handle should not be modified by the
environment later on, and ensuring this (e.g. using copy.deepcopy) is the
responsibility of the environment.
Returns | |
---|---|
state
|
The current state of the environment. |
observation_spec
observation_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the observations provided by the environment.
May use a subclass of ArraySpec
that specifies additional properties such
as min and max bounds on the values.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
render
render(
mode: Text = 'rgb_array'
) -> Optional[types.NestedArray]
Renders the environment.
Args | |
---|---|
mode
|
One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized. |
Returns | |
---|---|
An ndarray of shape [width, height, 3] denoting an RGB image if mode is
rgb_array . Otherwise return nothing and render directly to a display
window.
|
Raises | |
---|---|
NotImplementedError
|
If the environment does not support rendering. |
reset
reset() -> tf_agents.trajectories.TimeStep
Starts a new sequence and returns the first TimeStep
of this sequence.
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType of FIRST .
reward: 0.0, indicating the reward.
discount: 1.0, indicating the discount.
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to observation_spec() .
|
reward_spec
reward_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the rewards that are returned by step()
.
Override this method to define an environment that uses non-standard reward values, for example an environment with array-valued rewards.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
seed
seed(
seed: tf_agents.typing.types.Seed
) -> Any
Seeds the environment.
Args | |
---|---|
seed
|
Value to use as seed for the environment. |
set_state
set_state(
state: Any
) -> None
Restores the environment to a given state
.
See definition of state
in the documentation for get_state().
Args | |
---|---|
state
|
A state to restore the environment to. |
should_reset
should_reset(
current_time_step: tf_agents.trajectories.TimeStep
) -> bool
Whether the Environmet should reset given the current timestep.
By default it only resets when all time_steps are LAST
.
Args | |
---|---|
current_time_step
|
The current TimeStep .
|
Returns | |
---|---|
A bool indicating whether the Environment should reset or not. |
step
step(
action: tf_agents.typing.types.NestedArray
) -> tf_agents.trajectories.TimeStep
Updates the environment according to the action and returns a TimeStep
.
If the environment returned a TimeStep
with StepType.LAST
at the
previous step the implementation of _step
in the environment should call
reset
to start a new sequence and ignore action
.
This method will start a new sequence if called after the environment
has been constructed and reset
has not been called. In this case
action
will be ignored.
If should_reset(current_time_step)
is True, then this method will reset
by itself. In this case action
will be ignored.
Args | |
---|---|
action
|
A NumPy array, or a nested dict, list or tuple of arrays
corresponding to action_spec() .
|
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: A NumPy array, reward value for this timestep.
discount: A NumPy array, discount in the range [0, 1].
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to observation_spec() .
|
time_step_spec
time_step_spec() -> tf_agents.trajectories.TimeStep
Describes the TimeStep
fields returned by step()
.
Override this method to define an environment that uses non-standard values
for any of the items returned by step()
. For example, an environment with
array-valued rewards.
Returns | |
---|---|
A TimeStep namedtuple containing (possibly nested) ArraySpec s defining
the step_type, reward, discount, and observation structure.
|
__enter__
__enter__()
Allows the environment to be used in a with-statement context.
__exit__
__exit__(
unused_exception_type, unused_exc_value, unused_traceback
)
Allows the environment to be used in a with-statement context.