tf_agents.replay_buffers.reverb_utils.ReverbTrajectorySequenceObserver

Reverb trajectory sequence observer.

Inherits From: ReverbAddTrajectoryObserver

This is equivalent to ReverbAddTrajectoryObserver but sequences are not cut when a boundary trajectory is seen. This allows for sequences to be sampled with boundaries anywhere in the sequence rather than just at the end.

Consider using this observer when you want to create training experience that can encompass any subsequence of the observed trajectories.

py_client Python client for the reverb replay server.
table_name The table name(s) where samples will be written to.
sequence_length The sequence_length used to write to the given table.
stride_length The integer stride for the sliding window for overlapping sequences. The default value of 1 creates an item for every window. Using L = sequence_length this means items are created for times {0, 1, .., L-1}, {1, 2, .., L}, .... In contrast, stride_length = L will create an item only for disjoint windows {0, 1, ..., L-1}, {L, ..., 2 * L - 1}, ....
priority Initial priority for new samples in the RB.
pad_end_of_episodes At the end of an episode, the cache is dropped by default. When pad_end_of_episodes = True, the cache gets padded with boundary steps (last->first) with 0 values everywhere and padded items of sequence_length are written to Reverb. The last padded item starts with a boundary step from the episode. This ensures that the last few steps are not less likely to get sampled compared to middle steps, this is most useful for environments that have useful rewards at the end of episodes. Note: because we do not pad at the beginning of an episode, for sequence_length = N > 1 scenarios, the first N-1 steps in an episode are sampled less frequently than all other steps. This generally does not impact training performance. However, if you have an environment where the only meaningful rewards are at the beginning of the episodes, you may consider filing a feature request to support padding in the front as well.

Methods

close

View source

Closes the writer of the observer.

open

View source

Open the writer of the observer.

reset

View source

Resets the state of the observer.

Args
write_cached_steps boolean flag indicating whether we want to write the cached trajectory. When this argument is True, the function attempts to write the cached data before resetting (optionally with padding). Otherwise, the cached data gets dropped.

__call__

View source

Writes the trajectory into the underlying replay buffer.

Allows trajectory to be a flattened trajectory. No batch dimension allowed.

Args
trajectory The trajectory to be written which could be (possibly nested) trajectory object or a flattened version of a trajectory. It assumes there is no batch dimension.