TensorFlow is back at Google I/O on May 14! Register now

tf_agents.bandits.drivers.driver_utils.trajectory_for_bandit

Builds a trajectory from a single-step bandit episode.

tf_agents.bandits.drivers.driver_utils.trajectory_for_bandit(
    initial_step: tf_agents.typing.types.TimeStep,
    action_step: tf_agents.typing.types.PolicyStep,
    final_step: tf_agents.typing.types.TimeStep
) -> tf_agents.typing.types.NestedTensor

Since all episodes consist of a single step, the returned Trajectory has no time dimension. All input and output Tensors/arrays are expected to have shape [batch_size, ...].

Args
`initial_step`	A `TimeStep` returned from `environment.step(...)`.
`action_step`	A `PolicyStep` returned by `policy.action(...)`.
`final_step`	A `TimeStep` returned from `environment.step(...)`.

Returns
A `Trajectory` containing zeros for discount value and `StepType.LAST` for both `step_type` and `next_step_type`.

tf_agents.bandits.drivers.driver_utils.trajectory_for_bandit

Args

Returns