Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf_agents.replay_buffers.py_hashed_replay_buffer.PyHashedReplayBuffer

View source on GitHub

A Python-based replay buffer with optimized underlying storage.

Inherits From: PyUniformReplayBuffer

tf_agents.replay_buffers.py_hashed_replay_buffer.PyHashedReplayBuffer(
    data_spec, capacity, log_interval=None
)

This replay buffer deduplicates data in the stored trajectories along the last axis of the observation, which is useful, e.g., if you are performing something like frame stacking. For example, if each observation is 4 stacked 84x84 grayscale images forming a shape [84, 84, 4], then the replay buffer will separate out each of the images and depuplicate across each trajectory in case an image is repeated.

Args:

  • data_spec: An ArraySpec or a list/tuple/nest of ArraySpecs describing a single item that can be stored in this buffer.
  • capacity: The maximum number of items that can be stored in the buffer.

Attributes:

  • capacity: Returns the capacity of the replay buffer.
  • data_spec: Returns the spec for items in the replay buffer.
  • name: Returns the name of this module as passed or determined in the ctor.

    NOTE: This is not the same as the self.name_scope.name which includes parent module names.

  • name_scope: Returns a tf.name_scope instance for this class.

  • size

  • stateful_dataset: Returns whether the dataset of the replay buffer has stateful ops.

  • submodules: Sequence of all sub-modules.

    Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).

a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []
  • trainable_variables: Sequence of trainable variables owned by this module and its submodules.

  • variables: Sequence of variables owned by this module and its submodules.

Methods

add_batch

View source

add_batch(
    items
)

Adds a batch of items to the replay buffer.

Args:

  • items: An item or list/tuple/nest of items to be added to the replay buffer. items must match the data_spec of this class, with a batch_size dimension added to the beginning of each tensor/array.

Returns:

Adds items to the replay buffer.

as_dataset

View source

as_dataset(
    sample_batch_size=None, num_steps=None, num_parallel_calls=None,
    single_deterministic_pass=False
)

Creates and returns a dataset that returns entries from the buffer.

A single entry from the dataset is equivalent to one output from get_next(sample_batch_size=sample_batch_size, num_steps=num_steps).

Args:

  • sample_batch_size: (Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec.
  • num_steps: (Optional.) Optional way to specify that sub-episodes are desired. If None (default), a batch of single items is returned. Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, and remaining dimensions which match the data_spec of this class.
  • num_parallel_calls: (Optional.) A tf.int32 scalar tf.Tensor, representing the number elements to process in parallel. If not specified, elements will be processed sequentially.
  • single_deterministic_pass: Python boolean. If True, the dataset will return a single deterministic pass through its underlying data. NOTE: If the buffer is modified while a Dataset iterator is iterating over this data, the iterator may miss any new data or otherwise have subtly invalid data.

Returns:

A dataset of type tf.data.Dataset, elements of which are 2-tuples of: - An item or sequence of items or batch thereof - Auxiliary info for the items (i.e. ids, probs).

Raises:

  • NotImplementedError: If a non-default argument value is not supported.
  • ValueError: If the data spec contains lists that must be converted to tuples.

clear

View source

clear()

Resets the contents of replay buffer.

Returns:

Clears the replay buffer contents.

gather_all

View source

gather_all()

Returns all the items in buffer.

NOTE This method will soon be deprecated in favor of as_dataset(..., single_deterministic_pass=True).

Returns:

Returns all the items currently in the buffer. Returns a tensor of shape [B, T, ...] where B = batch size, T = timesteps, and the remaining shape is the shape spec of the items in the buffer.

get_next

View source

get_next(
    sample_batch_size=None, num_steps=None, time_stacked=True
)

Returns an item or batch of items from the buffer.

Args:

  • sample_batch_size: (Optional.) An optional batch_size to specify the number of items to return. If None (default), a single item is returned which matches the data_spec of this class (without a batch dimension). Otherwise, a batch of sample_batch_size items is returned, where each tensor in items will have its first dimension equal to sample_batch_size and the rest of the dimensions match the corresponding data_spec. See examples below.
  • num_steps: (Optional.) Optional way to specify that sub-episodes are desired. If None (default), in non-episodic replay buffers, a batch of single items is returned. In episodic buffers, full episodes are returned (note that sample_batch_size must be None in that case). Otherwise, a batch of sub-episodes is returned, where a sub-episode is a sequence of consecutive items in the replay_buffer. The returned tensors will have first dimension equal to sample_batch_size (if sample_batch_size is not None), subsequent dimension equal to num_steps, if time_stacked=True and remaining dimensions which match the data_spec of this class. See examples below.
  • time_stacked: (Optional.) Boolean, when true and num_steps > 1 it returns the items stacked on the time dimension. See examples below for details. Examples of tensor shapes returned: (B = batch size, T = timestep, D = data spec) get_next(sample_batch_size=None, num_steps=None, time_stacked=True) return shape (non-episodic): [D] return shape (episodic): T, D get_next(sample_batch_size=B, num_steps=None, time_stacked=True) return shape (non-episodic): [B, D] return shape (episodic): Not supported get_next(sample_batch_size=B, num_steps=T, time_stacked=True) return shape: [B, T, D] get_next(sample_batch_size=None, num_steps=T, time_stacked=False) return shape: ([D], [D], ..) T tensors in the tuple get_next(sample_batch_size=B, num_steps=T, time_stacked=False) return shape: ([B, D], [B, D], ..) T tensors in the tuple

Returns:

A 2-tuple containing: - An item or sequence of (optionally batched and stacked) items. - Auxiliary info for the items (i.e. ids, probs).

num_frames

View source

num_frames()

Returns the number of frames in the replay buffer.

with_name_scope

@classmethod
with_name_scope(
    cls, method
)

Decorator to automatically enter the module name scope.

class MyModule(tf.Module):
  @tf.Module.with_name_scope
  def __call__(self, x):
    if not hasattr(self, 'w'):
      self.w = tf.Variable(tf.random.normal([x.shape[1], 64]))
    return tf.matmul(x, self.w)

Using the above module would produce tf.Variables and tf.Tensors whose names included the module name:

mod = MyModule()
mod(tf.ones([8, 32]))
# ==> <tf.Tensor: ...>
mod.w
# ==> <tf.Variable ...'my_module/w:0'>

Args:

  • method: The method to wrap.

Returns:

The original method wrapped such that it enters the module's name scope.