# tf_agents.trajectories.to_n_step_transition

Create an n-step transition from a trajectory with `T=N + 1` frames.

The output transition's `next_time_step.{reward, discount}` will contain N-step discounted reward and discount values calculated as:

``````next_time_step.reward = r_t +
g^{1} * d_t * r_{t+1} +
g^{2} * d_t * d_{t+1} * r_{t+2} +
g^{3} * d_t * d_{t+1} * d_{t+2} * r_{t+3} +
...
g^{N-1} * d_t * ... * d_{t+N-2} * r_{t+N-1}
next_time_step.discount = g^{N-1} * d_t * d_{t+1} * ... * d_{t+N-1}
``````

#### In python notation:

``````discount = gamma**(N-1) * reduce_prod(trajectory.discount[:, :-1])
reward = discounted_return(
rewards=trajectory.reward[:, :-1],
discounts=gamma * trajectory.discount[:, :-1])
``````

When `trajectory.discount[:, :-1]` is an all-ones tensor, this is equivalent to:

``````next_time_step.discount = (
gamma**(N-1) * tf.ones_like(trajectory.discount[:, 0]))
next_time_step.reward = (
sum_{n=0}^{N-1} gamma**n * trajectory.reward[:, n])
``````

`trajectory` An instance of `Trajectory`. The tensors in Trajectory must have shape `[B, T, ...]`. `discount` is assumed to be a scalar float, hence the shape of `trajectory.discount` must be `[B, T]`.
`gamma` A floating point scalar; the discount factor.

An N-step `Transition` where `N = T - 1`. The reward and discount in `time_step.{reward, discount}` are NaN. The n-step discounted reward and final discount are stored in `next_time_step.{reward, discount}`. All tensors in the `Transition` have shape `[B, ...]` (no time dimension).

`ValueError` if `discount.shape.rank != 2`.
`ValueError` if `discount.shape[1] < 2`.

