Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.bandits.environments.piecewise_stochastic_environment.PiecewiseStochasticEnvironment

View source on GitHub

Implements a piecewise stationary linear environment.

Inherits From: NonStationaryStochasticEnvironment

tf_agents.bandits.environments.piecewise_stochastic_environment.PiecewiseStochasticEnvironment(
    *args, **kwargs
)

Args:

  • observation_distribution: A distribution from tfp.distributions with shape [batch_size, observation_dim]. Note that the values of batch_size and observation_dim are deduced from the distribution.
  • interval_distribution: A scalar distribution from tfp.distributions. The value is casted to int64 to update the time range.
  • observation_to_reward_distribution: A distribution from tfp.distributions with shape [observation_dim, num_actions]. The value observation_dim must match the second dimension of observation_distribution.
  • additive_reward_distribution: A distribution from tfp.distributions with shape [num_actions]. This models the non-contextual behavior of the bandit.

Attributes:

  • batch_size
  • batched
  • environment_dynamics

Methods

action_spec

View source

action_spec()

Describes the specs of the Tensors expected by step(action).

action can be a single Tensor, or a nested dict, list or tuple of Tensors.

Returns:

An single TensorSpec, or a nested dict, list or tuple of TensorSpec objects, which describe the shape and dtype of each Tensor expected by step().

current_time_step

View source

current_time_step()

Returns the current TimeStep.

Returns:

A TimeStep namedtuple containing: step_type: A StepType value. reward: Reward at this time_step. discount: A discount in the range [0, 1]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to observation_spec().

observation_spec

View source

observation_spec()

Defines the TensorSpec of observations provided by the environment.

Returns:

A TensorSpec, or a nested dict, list or tuple of TensorSpec objects, which describe the observation.

render

View source

render()

Renders a frame from the environment.

Raises:

  • NotImplementedError: If the environment does not support rendering.

reset

View source

reset()

Resets the environment and returns the current time_step.

Returns:

A TimeStep namedtuple containing: step_type: A StepType value. reward: Reward at this time_step. discount: A discount in the range [0, 1]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to observation_spec().

step

View source

step(
    action
)

Steps the environment according to the action.

If the environment returned a TimeStep with StepType.LAST at the previous step, this call to step should reset the environment (note that it is expected that whoever defines this method, calls reset in this case), start a new sequence and action will be ignored.

This method will also start a new sequence if called after the environment has been constructed and reset() has not been called. In this case action will be ignored.

Expected sequences look like:

time_step -> action -> next_time_step

The action should depend on the previous time_step for correctness.

Args:

  • action: A Tensor, or a nested dict, list or tuple of Tensors corresponding to action_spec().

Returns:

A TimeStep namedtuple containing: step_type: A StepType value. reward: Reward at this time_step. discount: A discount in the range [0, 1]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to observation_spec().

time_step_spec

View source

time_step_spec()

Describes the TimeStep specs of Tensors returned by step().

Returns:

A TimeStep namedtuple containing TensorSpec objects defining the Tensors returned by step(), i.e. (step_type, reward, discount, observation).