View source on GitHub

Wraps an environment and flattens nested multi-dimensional observations.

Inherits From: PyEnvironmentBaseWrapper


The observation returned by the environment is a multi-dimensional sequence of items of varying lengths.

timestep.observation_spec = {'position': ArraySpec(shape=(4,), dtype=float32), 'target': ArraySpec(shape=(5,), dtype=float32)}

timestep.observation = {'position': [1,2,3,4], target': [5,6,7,8,9]}

By packing the observation, we reduce the dimensions into a single dimension and concatenate the values of all the observations into one array.

timestep.observation_spec = ( 'packed_observations': ArraySpec(shape=(9,), dtype=float32)

timestep.observation = [1,2,3,4,5,6,7,8,9] # Array of len-9.

env A py_environment.PyEnvironment environment to wrap.
observations_whitelist A list of observation keys that want to be observed from the environment. All other observations returned are filtered out. If not provided, all observations will be kept. Additionally, if this is provided, the environment is expected to return a dictionary of observations.

ValueError If the current environment does not return a dictionary of observations and observations whitelist is provided.
ValueError If the observation whitelist keys are not found in the environment.

batch_size The batch size of the environment.
batched Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.



View source

Defines the actions that should be provided to step().

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

An ArraySpec, or a nested dict, list or tuple of ArraySpecs.


View source

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method be used directly

env = Env(...)
# Use env.

or via a context manager

with Env(...) as env:
  # Use env.


View source

Returns the current timestep.


View source

Returns the environment info returned on the last step.

Info returned by last call to step(). None by default.

NotImplementedError If the environment does not use info.


View source

Returns the state of the environment.

The state contains everything required to restore the environment to the current configuration. This can contain e.g.

  • The current time_step.
  • The number of steps taken in the environment (for finite horizon MDPs).
  • Hidden state (for POMDPs).

Callers should not assume anything about the contents or format of the returned state. It should be treated as a token that can be passed back to set_state() later.

state The current state of the environment.


View source

Defines the observations provided by the environment.

An ArraySpec with a shape of the total length of observations kept.


View source

Renders the environment.

mode One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized.

An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.

NotImplementedError If the environment does not support rendering.


View source

Starts a new sequence and returns the first TimeStep of this sequence.

A TimeStep namedtuple containing: step_type: A StepType of FIRST. reward: 0.0, indicating the reward. discount: 1.0, indicating the discount. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec().


View source

Seeds the environment.

seed Value to use as seed for the environment.


View source

Restores the environment to a given state.

See definition of state in the documentation for get_state().

state A state to restore the environment to.


View source

Updates the environment according to the action and returns a TimeStep.

If the environment returned a TimeStep with StepType.LAST at the previous step the implementation of _step in the environment should call reset to start a new sequence and ignore action.

This method will start a new sequence if called after the environment has been constructed and reset has not been called. In this case action will be ignored.

action A NumPy array, or a nested dict, list or tuple of arrays corresponding to action_spec().

A TimeStep namedtuple containing: step_type: A StepType value. reward: A NumPy array, reward value for this timestep. discount: A NumPy array, discount in the range [0, 1]. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec().


View source

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with array-valued rewards.

A TimeStep namedtuple containing (possibly nested) ArraySpecs defining the step_type, reward, discount, and observation structure.


View source


View source

Allows the environment to be used in a with-statement context.


View source

Allows the environment to be used in a with-statement context.