tf_agents.bandits.drivers.driver_utils.trajectory_for_bandit

Builds a trajectory from a single-step bandit episode.

Since all episodes consist of a single step, the returned Trajectory has no time dimension. All input and output Tensors/arrays are expected to have shape [batch_size, ...].

initial_step A TimeStep returned from environment.step(...).
action_step A PolicyStep returned by policy.action(...).
final_step A TimeStep returned from environment.step(...).

A Trajectory containing zeros for discount value and StepType.LAST for both step_type and next_step_type.