Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.environments.parallel_py_environment.ParallelPyEnvironment

View source on GitHub

Batch together environments and simulate them in external processes.

Inherits From: PyEnvironment

tf_agents.environments.parallel_py_environment.ParallelPyEnvironment(
    *args, **kwargs
)

The environments are created in external processes by calling the provided callables. This can be an environment class, or a function creating the environment and potentially wrapping it. The returned environment should not access global variables.

Args:

  • env_constructors: List of callables that create environments.
  • start_serially: Whether to start environments serially or in parallel.
  • blocking: Whether to step environments one after another.
  • flatten: Boolean, whether to use flatten action and time_steps during communication to reduce overhead.

Attributes:

  • batch_size: The batch size of the environment.

  • batched: Whether the environment is batched or not.

    If the environment supports batched observations and actions, then overwrite this property to True.

    A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

    When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Raises:

  • ValueError: If the action or observation specs don't match.

Methods

__enter__

View source

__enter__()

Allows the environment to be used in a with-statement context.

__exit__

View source

__exit__(
    unused_exception_type, unused_exc_value, unused_traceback
)

Allows the environment to be used in a with-statement context.

action_spec

View source

action_spec()

Defines the actions that should be provided to step().

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

Returns:

An ArraySpec, or a nested dict, list or tuple of ArraySpecs.

close

View source

close()

Close all external process.

current_time_step

View source

current_time_step()

Returns the current timestep.

get_info

View source

get_info()

Returns the environment info returned on the last step.

Returns:

Info returned by last call to step(). None by default.

Raises:

  • NotImplementedError: If the environment does not use info.

get_state

View source

get_state()

Returns the state of the environment.

The state contains everything required to restore the environment to the current configuration. This can contain e.g.

  • The current time_step.
  • The number of steps taken in the environment (for finite horizon MDPs).
  • Hidden state (for POMDPs).

Callers should not assume anything about the contents or format of the returned state. It should be treated as a token that can be passed back to set_state() later.

Returns:

  • state: The current state of the environment.

observation_spec

View source

observation_spec()

Defines the observations provided by the environment.

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

Returns:

An ArraySpec, or a nested dict, list or tuple of ArraySpecs.

render

View source

render(
    mode='rgb_array'
)

Renders the environment.

Args:

  • mode: One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized.

Returns:

An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.

Raises:

  • NotImplementedError: If the environment does not support rendering.

reset

View source

reset()

Starts a new sequence and returns the first TimeStep of this sequence.

Returns:

A TimeStep namedtuple containing: step_type: A StepType of FIRST. reward: 0.0, indicating the reward. discount: 1.0, indicating the discount. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec().

reward_spec

View source

reward_spec()

Defines the rewards that are returned by step().

Override this method to define an environment that uses non-standard reward values, for example an environment with array-valued rewards.

Returns:

An ArraySpec, or a nested dict, list or tuple of ArraySpecs.

seed

View source

seed(
    seeds
)

Seeds the parallel environments.

set_state

View source

set_state(
    state
)

Restores the environment to a given state.

See definition of state in the documentation for get_state().

Args:

  • state: A state to restore the environment to.

start

View source

start()

step

View source

step(
    action
)

Updates the environment according to the action and returns a TimeStep.

If the environment returned a TimeStep with StepType.LAST at the previous step the implementation of _step in the environment should call reset to start a new sequence and ignore action.

This method will start a new sequence if called after the environment has been constructed and reset has not been called. In this case action will be ignored.

Args:

  • action: A NumPy array, or a nested dict, list or tuple of arrays corresponding to action_spec().

Returns:

A TimeStep namedtuple containing: step_type: A StepType value. reward: A NumPy array, reward value for this timestep. discount: A NumPy array, discount in the range [0, 1]. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec().

time_step_spec

View source

time_step_spec()

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with array-valued rewards.

Returns:

A TimeStep namedtuple containing (possibly nested) ArraySpecs defining the step_type, reward, discount, and observation structure.