Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.replay_buffers.replay_buffer.ReplayBuffer

View source on GitHub

Abstract base class for TF-Agents replay buffer.

tf_agents.replay_buffers.replay_buffer.ReplayBuffer(
    data_spec, capacity, stateful_dataset=False
)

In eager mode, methods modify the buffer or return values directly. In graph mode, methods return ops that do so when executed.

Args:

  • data_spec: A spec or a list/tuple/nest of specs describing a single item that can be stored in this buffer
  • capacity: number of elements that the replay buffer can hold.
  • stateful_dataset: whether the dataset contains stateful ops or not.

Attributes:

  • capacity: Returns the capacity of the replay buffer.
  • data_spec: Returns the spec for items in the replay buffer.
  • name: Returns the name of this module as passed or determined in the ctor.

    NOTE: This is not the same as the self.name_scope.name which includes parent module names.

  • name_scope: Returns a tf.name_scope instance for this class.

  • stateful_dataset: Returns whether the dataset of the replay buffer has stateful ops.

  • submodules: Sequence of all sub-modules.

    Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).

a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []
  • trainable_variables: Sequence of trainable variables owned by this module and its submodules.

  • variables: Sequence of variables owned by this module and its submodules.

Methods

add_batch

View source

add_batch(
    items
)

Adds a batch of items to the replay buffer.

Args:

  • items: An item or list/tuple/nest of items to be added to the replay buffer. items must match the data_spec of this class, with a batch_size dimension added to the beginning of each tensor/array.

Returns:

Adds items to the replay buffer.

as_dataset

View source

as_dataset(
    sample_batch_size=None, num_steps=None, num_parallel_calls=None,
    single_deterministic_pass=False
)

Creates and returns a dataset that returns entries from the buffer.

A single entry from the dataset is equivalent to one output from get_next(sample_batch_size=sample_batch_size, num_steps=num_steps).

Args:

  • sample_batch_size: (Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec.
  • num_steps: (Optional.) Optional way to specify that sub-episodes are desired. If None (default), a batch of single items is returned. Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, and remaining dimensions which match the data_spec of this class.
  • num_parallel_calls: (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process in parallel. If not specified, elements will be processed sequentially.
  • single_deterministic_pass: Python boolean. If True, the dataset will return a single deterministic pass through its underlying data. NOTE: If the buffer is modified while a Dataset iterator is iterating over this data, the iterator may miss any new data or otherwise have subtly invalid data.

Returns:

A dataset of type tf.data.Dataset, elements of which are 2-tuples of: - An item or sequence of items or batch thereof - Auxiliary info for the items (i.e. ids, probs).

Raises:

  • NotImplementedError: If a non-default argument value is not supported.
  • ValueError: If the data spec contains lists that must be converted to tuples.

clear

View source

clear()

Resets the contents of replay buffer.

Returns:

Clears the replay buffer contents.

gather_all

View source

gather_all()

Returns all the items in buffer.

NOTE This method will soon be deprecated in favor of as_dataset(..., single_deterministic_pass=True).

Returns:

Returns all the items currently in the buffer. Returns a tensor of shape [B, T, ...] where B = batch size, T = timesteps, and the remaining shape is the shape spec of the items in the buffer.

get_next

View source

get_next(
    sample_batch_size=None, num_steps=None, time_stacked=True
)

Returns an item or batch of items from the buffer.

Args:

  • sample_batch_size: (Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec. See examples below.
  • num_steps: (Optional.) Optional way to specify that sub-episodes are desired. If None (default), in non-episodic replay buffers, a batch of single items is returned. In episodic buffers, full episodes are returned (note that sample_batch_size must be None in that case). Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, if time_stacked=True and remaining dimensions which match the data_spec of this class. See examples below.
  • time_stacked: (Optional.) Boolean, when true and num_steps > 1 it returns the items stacked on the time dimension. See examples below for details. Examples of tensor shapes returned: (B = batch size, T = timestep, D = data spec) get_next(sample_batch_size=None, num_steps=None, time_stacked=True) return shape (non-episodic): [D] return shape (episodic): T, D get_next(sample_batch_size=B, num_steps=None, time_stacked=True) return shape (non-episodic): [B, D] return shape (episodic): Not supported get_next(sample_batch_size=B, num_steps=T, time_stacked=True) return shape: [B, T, D] get_next(sample_batch_size=None, num_steps=T, time_stacked=False) return shape: ([D], [D], ..) T tensors in the tuple get_next(sample_batch_size=B, num_steps=T, time_stacked=False) return shape: ([B, D], [B, D], ..) T tensors in the tuple

Returns:

A 2-tuple containing: - An item or sequence of (optionally batched and stacked) items. - Auxiliary info for the items (i.e. ids, probs).

num_frames

View source

num_frames()

Returns the number of frames in the replay buffer.

with_name_scope

@classmethod
with_name_scope(
    cls, method
)

Decorator to automatically enter the module name scope.

class MyModule(tf.Module):
  @tf.Module.with_name_scope
  def __call__(self, x):
    if not hasattr(self, 'w'):
      self.w = tf.Variable(tf.random.normal([x.shape[1], 64]))
    return tf.matmul(x, self.w)

Using the above module would produce tf.Variables and tf.Tensors whose names included the module name:

mod = MyModule()
mod(tf.ones([8, 32]))
# ==> <tf.Tensor: ...>
mod.w
# ==> <tf.Variable ...'my_module/w:0'>

Args:

  • method: The method to wrap.

Returns:

The original method wrapped such that it enters the module's name scope.