|View source on GitHub|
A driver that takes steps in an environment using a policy.
tf_agents.drivers.driver.Driver( env, policy, observers=None, transition_observers=None )
env: An environment.Base environment.
policy: A policy.Base policy.
observers: A list of observers that are updated after the driver is run. Each observer is a callable(Trajectory) that returns the input. Trajectory.time_step is a stacked batch [N+1, batch_size, ...] of timesteps and Trajectory.action is a stacked batch [N, batch_size, ...] of actions in time major form.
transition_observers: A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). The transition is shaped just as trajectories are for regular observers.
Takes steps in the environment and updates observers.