tf_agents.drivers.driver.Driver

A driver that takes steps in an environment using a policy.

env An environment.Base environment.
policy A policy.Base policy.
observers A list of observers that are updated after the driver is run. Each observer is a callable(Trajectory) that returns the input. Trajectory.time_step is a stacked batch [N+1, batch_size, ...] of timesteps and Trajectory.action is a stacked batch [N, batch_size, ...] of actions in time major form.
transition_observers A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). The transition is shaped just as trajectories are for regular observers.

env

observers

policy

transition_observers

Methods

run

View source

Takes steps in the environment and updates observers.