Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearDynamics

View source on GitHub

A drifting linear environment dynamics.

Inherits From: EnvironmentDynamics

tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearDynamics(
    *args, **kwargs
)

This is a drifting linear environment which computes rewards as:

rewards(t) = observation(t) * observation_to_reward(t) + additive_reward(t)

where t is the environment time. observation_to_reward slowly rotates over time. The environment time is incremented in the base class after the reward is computed. The parameters observation_to_reward and additive_reward are updated at each time step. In order to preserve the norm of the observation_to_reward (and the range of values of the reward) the drift is applied in form of rotations, i.e.,

observation_to_reward(t) = R(theta(t)) * observation_to_reward(t - 1)

where theta is the angle of the rotation. The angle is sampled from a provided input distribution.

Args:

  • observation_distribution: A distribution from tfp.distributions with shape [batch_size, observation_dim] Note that the values of batch_size and observation_dim are deduced from the distribution.
  • observation_to_reward_distribution: A distribution from tfp.distributions with shape [observation_dim, num_actions]. The value observation_dim must match the second dimension of observation_distribution.
  • drift_distribution: A scalar distribution from tfp.distributions of type tf.float32. It represents the angle of rotation.
  • additive_reward_distribution: A distribution from tfp.distributions with shape [num_actions]. This models the non-contextual behavior of the bandit.

Attributes:

  • action_spec: Specification of the actions.
  • batch_size: Returns the batch size used for observations and rewards.
  • name: Returns the name of this module as passed or determined in the ctor.

    NOTE: This is not the same as the self.name_scope.name which includes parent module names.

  • name_scope: Returns a tf.name_scope instance for this class.

  • observation_spec: Specification of the observations.

  • submodules: Sequence of all sub-modules.

    Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).

a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []
  • trainable_variables: Sequence of trainable variables owned by this module and its submodules.

  • variables: Sequence of variables owned by this module and its submodules.

Methods

compute_optimal_action

compute_optimal_action(
    *args, **kwargs
)

compute_optimal_reward

compute_optimal_reward(
    *args, **kwargs
)

observation

View source

observation(
    unused_t
)

Returns an observation batch for the given time.

Args:

  • env_time: The scalar int64 tensor of the environment time step. This is incremented by the environment after the reward is computed.

Returns:

The observation batch with spec according to observation_spec.

reward

View source

reward(
    observation, t
)

Reward for the given observation and time step.

Args:

  • observation: A batch of observations with spec according to observation_spec.
  • env_time: The scalar int64 tensor of the environment time step. This is incremented by the environment after the reward is computed.

Returns:

A batch of rewards with spec shape [batch_size, num_actions] containing rewards for all arms.

with_name_scope

@classmethod
with_name_scope(
    cls, method
)

Decorator to automatically enter the module name scope.

class MyModule(tf.Module):
  @tf.Module.with_name_scope
  def __call__(self, x):
    if not hasattr(self, 'w'):
      self.w = tf.Variable(tf.random.normal([x.shape[1], 64]))
    return tf.matmul(x, self.w)

Using the above module would produce tf.Variables and tf.Tensors whose names included the module name:

mod = MyModule()
mod(tf.ones([8, 32]))
# ==> <tf.Tensor: ...>
mod.w
# ==> <tf.Variable ...'my_module/w:0'>

Args:

  • method: The method to wrap.

Returns:

The original method wrapped such that it enters the module's name scope.