Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.environments.wrappers.TimeLimit

View source on GitHub

End episodes after specified number of steps.

Inherits From: PyEnvironmentBaseWrapper

tf_agents.environments.wrappers.TimeLimit(
    *args, **kwargs
)

Attributes:

  • batch_size: The batch size of the environment.

  • batched: Whether the environment is batched or not.

    If the environment supports batched observations and actions, then overwrite this property to True.

    A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

    When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

  • duration

Methods

__enter__

View source

__enter__()

Allows the environment to be used in a with-statement context.

__exit__

View source

__exit__(
    unused_exception_type, unused_exc_value, unused_traceback
)

Allows the environment to be used in a with-statement context.

action_spec

View source

action_spec()

Defines the actions that should be provided to step().

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

Returns:

An ArraySpec, or a nested dict, list or tuple of ArraySpecs.

close

View source

close()

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method be used directly

env = Env(...)
# Use env.
env.close()

or via a context manager

with Env(...) as env:
  # Use env.

current_time_step

View source

current_time_step()

Returns the current timestep.

get_info

View source

get_info()

Returns the environment info returned on the last step.

Returns:

Info returned by last call to step(). None by default.

Raises:

  • NotImplementedError: If the environment does not use info.

observation_spec

View source

observation_spec()

Defines the observations provided by the environment.

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

Returns:

An ArraySpec, or a nested dict, list or tuple of ArraySpecs.

render

View source

render(
    mode='rgb_array'
)

Renders the environment.

Args:

  • mode: One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized.

Returns:

An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.

Raises:

  • NotImplementedError: If the environment does not support rendering.

reset

View source

reset()

Starts a new sequence and returns the first TimeStep of this sequence.

Returns:

A TimeStep namedtuple containing: step_type: A StepType of FIRST. reward: 0.0, indicating the reward. discount: 1.0, indicating the discount. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec().

seed

View source

seed(
    seed
)

Seeds the environment.

Args:

  • seed: Value to use as seed for the environment.

step

View source

step(
    action
)

Updates the environment according to the action and returns a TimeStep.

If the environment returned a TimeStep with StepType.LAST at the previous step the implementation of _step in the environment should call reset to start a new sequence and ignore action.

This method will start a new sequence if called after the environment has been constructed and reset has not been called. In this case action will be ignored.

Args:

  • action: A NumPy array, or a nested dict, list or tuple of arrays corresponding to action_spec().

Returns:

A TimeStep namedtuple containing: step_type: A StepType value. reward: A NumPy array, reward value for this timestep. discount: A NumPy array, discount in the range [0, 1]. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec().

time_step_spec

View source

time_step_spec()

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with array-valued rewards.

Returns:

A TimeStep namedtuple containing (possibly nested) ArraySpecs defining the step_type, reward, discount, and observation structure.

wrapped_env

View source

wrapped_env()