TensorFlow provides a number of methods for constructing Recurrent
Neural Networks. Most accept an RNNCell
subclassed object
(see the documentation for tf.nn.rnn_cell
).
tf.nn.dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
Creates a recurrent neural network specified by RNNCell cell
.
This function is functionally identical to the function rnn
above, but
performs fully dynamic unrolling of inputs
.
Unlike rnn
, the input inputs
is not a Python list of Tensors
, one for
each frame. Instead, inputs
may be a single Tensor
where
the maximum time is either the first or second dimension (see the parameter
time_major
). Alternatively, it may be a (possibly nested) tuple of
Tensors, each of them having matching batch and time dimensions.
The corresponding output is either a single Tensor
having the same number
of time steps and batch size, or a (possibly nested) tuple of such tensors,
matching the nested structure of cell.output_size
.
The parameter sequence_length
is optional and is used to copythrough state
and zeroout outputs when past a batch element's sequence length. So it's more
for correctness than performance, unlike in rnn().
Args:
cell
: An instance of RNNCell.
inputs
: The RNN inputs.If
time_major == False
(default), this must be aTensor
of shape:[batch_size, max_time, ...]
, or a nested tuple of such elements.If
time_major == True
, this must be aTensor
of shape:[max_time, batch_size, ...]
, or a nested tuple of such elements.This may also be a (possibly nested) tuple of Tensors satisfying this property. The first two dimensions must match across all the inputs, but otherwise the ranks and other shape components may differ. In this case, input to
cell
at each timestep will replicate the structure of these tuples, except for the time dimension (from which the time is taken).The input to
cell
at each time step will be aTensor
or (possibly nested) tuple of Tensors each with dimensions[batch_size, ...]
. 
sequence_length
: (optional) An int32/int64 vector sized[batch_size]
. initial_state
: (optional) An initial state for the RNN. Ifcell.state_size
is an integer, this must be aTensor
of appropriate type and shape[batch_size, cell.state_size]
. Ifcell.state_size
is a tuple, this should be a tuple of tensors having shapes[batch_size, s] for s in cell.state_size
.dtype
: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous dtype.parallel_iterations
: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.swap_memory
: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.time_major
: The shape format of theinputs
andoutputs
Tensors. If true, theseTensors
must be shaped[max_time, batch_size, depth]
. If false, theseTensors
must be shaped[batch_size, max_time, depth]
. Usingtime_major = True
is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batchmajor, so by default this function accepts input and emits output in batchmajor form.scope
: VariableScope for the created subgraph; defaults to "RNN".
Returns:
A pair (outputs, state) where:

outputs
: The RNN outputTensor
.If time_major == False (default), this will be a
Tensor
shaped:[batch_size, max_time, cell.output_size]
.If time_major == True, this will be a
Tensor
shaped:[max_time, batch_size, cell.output_size]
.Note, if
cell.output_size
is a (possibly nested) tuple of integers orTensorShape
objects, thenoutputs
will be a tuple having the same structure ascell.output_size
, containing Tensors having shapes corresponding to the shape data incell.output_size
. 
state
: The final state. Ifcell.state_size
is an int, this will be shaped[batch_size, cell.state_size]
. If it is aTensorShape
, this will be shaped[batch_size] + cell.state_size
. If it is a (possibly nested) tuple of ints orTensorShape
, this will be a tuple having the corresponding shapes.
Raises:
TypeError
: Ifcell
is not an instance of RNNCell.ValueError
: If inputs is None or an empty list.
tf.nn.rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)
Creates a recurrent neural network specified by RNNCell cell
.
The simplest form of RNN network generated is:
state = cell.zero_state(...)
outputs = []
for input_ in inputs:
output, state = cell(input_, state)
outputs.append(output)
return (outputs, state)
However, a few other options are available:
An initial state can be provided. If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time), and properly propagates the state at an example's sequence length to the final state output.
The dynamic calculation performed is, at time t
for batch row b
,
(output, state)(b, t) =
(t >= sequence_length(b))
? (zeros(cell.output_size), states(b, sequence_length(b)  1))
: cell(input(b, t), state(b, t  1))
Args:
cell
: An instance of RNNCell.inputs
: A length T list of inputs, each aTensor
of shape[batch_size, input_size]
, or a nested tuple of such elements.initial_state
: (optional) An initial state for the RNN. Ifcell.state_size
is an integer, this must be aTensor
of appropriate type and shape[batch_size, cell.state_size]
. Ifcell.state_size
is a tuple, this should be a tuple of tensors having shapes[batch_size, s] for s in cell.state_size
.dtype
: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous dtype.sequence_length
: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size[batch_size]
, values in[0, T)
.scope
: VariableScope for the created subgraph; defaults to "RNN".
Returns:
A pair (outputs, state) where:  outputs is a length T list of outputs (one for each input), or a nested tuple of such elements.  state is the final state
Raises:
TypeError
: Ifcell
is not an instance of RNNCell.ValueError
: Ifinputs
isNone
or an empty list, or if the input depth (column size) cannot be inferred from inputs via shape inference.
tf.nn.state_saving_rnn(cell, inputs, state_saver, state_name, sequence_length=None, scope=None)
RNN that accepts a state saver for timetruncated RNN calculation.
Args:
cell
: An instance ofRNNCell
.inputs
: A length T list of inputs, each aTensor
of shape[batch_size, input_size]
.state_saver
: A state saver object with methodsstate
andsave_state
.state_name
: Python string or tuple of strings. The name to use with the state_saver. If the cell returns tuples of states (i.e.,cell.state_size
is a tuple) thenstate_name
should be a tuple of strings having the same length ascell.state_size
. Otherwise it should be a single string.sequence_length
: (optional) An int32/int64 vector size [batch_size]. See the documentation for rnn() for more details about sequence_length.scope
: VariableScope for the created subgraph; defaults to "RNN".
Returns:
A pair (outputs, state) where: outputs is a length T list of outputs (one for each input) states is the final state
Raises:
TypeError
: Ifcell
is not an instance of RNNCell.ValueError
: Ifinputs
isNone
or an empty list, or if the arity and type ofstate_name
does not match that ofcell.state_size
.
tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None, initial_state_fw=None, initial_state_bw=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
Creates a dynamic version of bidirectional recurrent neural network.
Similar to the unidirectional case above (rnn) but takes input and builds independent forward and backward RNNs. The input_size of forward and backward cell must match. The initial state for both directions is zero by default (but can be set optionally) and no intermediate states are ever returned  the network is fully unrolled for the given (passed in) length(s) of the sequence(s) or completely unrolled if length(s) is not given.
Args:
cell_fw
: An instance of RNNCell, to be used for forward direction.cell_bw
: An instance of RNNCell, to be used for backward direction.inputs
: The RNN inputs. If time_major == False (default), this must be a tensor of shape:[batch_size, max_time, input_size]
. If time_major == True, this must be a tensor of shape:[max_time, batch_size, input_size]
. [batch_size, input_size].sequence_length
: An int32/int64 vector, size[batch_size]
, containing the actual lengths for each of the sequences.initial_state_fw
: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape[batch_size, cell_fw.state_size]
. Ifcell_fw.state_size
is a tuple, this should be a tuple of tensors having shapes[batch_size, s] for s in cell_fw.state_size
.initial_state_bw
: (optional) Same as forinitial_state_fw
, but using the corresponding properties ofcell_bw
.dtype
: (optional) The data type for the initial states and expected output. Required if initial_states are not provided or RNN states have a heterogeneous dtype.parallel_iterations
: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.swap_memory
: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.time_major
: The shape format of theinputs
andoutputs
Tensors. If true, theseTensors
must be shaped[max_time, batch_size, depth]
. If false, theseTensors
must be shaped[batch_size, max_time, depth]
. Usingtime_major = True
is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batchmajor, so by default this function accepts input and emits output in batchmajor form.dtype
: (optional) The data type for the initial state. Required if initial_state is not provided.sequence_length
: An int32/int64 vector, size[batch_size]
, containing the actual lengths for each of the sequences. either of the initial states are not provided.scope
: VariableScope for the created subgraph; defaults to "BiRNN"
Returns:
A tuple (outputs, output_states) where:
outputs
: A tuple (output_fw, output_bw) containing the forward and the backward rnn outputTensor
. If time_major == False (default), output_fw will be aTensor
shaped:[batch_size, max_time, cell_fw.output_size]
and output_bw will be aTensor
shaped:[batch_size, max_time, cell_bw.output_size]
. If time_major == True, output_fw will be aTensor
shaped:[max_time, batch_size, cell_fw.output_size]
and output_bw will be aTensor
shaped:[max_time, batch_size, cell_bw.output_size]
. It returns a tuple instead of a single concatenatedTensor
, unlike in thebidirectional_rnn
. If the concatenated one is preferred, the forward and backward outputs can be concatenated astf.concat(2, outputs)
.output_states
: A tuple (output_state_fw, output_state_bw) containing the forward and the backward final states of bidirectional rnn.
Raises:
TypeError
: Ifcell_fw
orcell_bw
is not an instance ofRNNCell
.
tf.nn.bidirectional_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, dtype=None, sequence_length=None, scope=None)
Creates a bidirectional recurrent neural network.
Similar to the unidirectional case above (rnn) but takes input and builds independent forward and backward RNNs with the final forward and backward outputs depthconcatenated, such that the output will have the format [time][batch][cell_fw.output_size + cell_bw.output_size]. The input_size of forward and backward cell must match. The initial state for both directions is zero by default (but can be set optionally) and no intermediate states are ever returned  the network is fully unrolled for the given (passed in) length(s) of the sequence(s) or completely unrolled if length(s) is not given.
Args:
cell_fw
: An instance of RNNCell, to be used for forward direction.cell_bw
: An instance of RNNCell, to be used for backward direction.inputs
: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.initial_state_fw
: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape[batch_size, cell_fw.state_size]
. Ifcell_fw.state_size
is a tuple, this should be a tuple of tensors having shapes[batch_size, s] for s in cell_fw.state_size
.initial_state_bw
: (optional) Same as forinitial_state_fw
, but using the corresponding properties ofcell_bw
.dtype
: (optional) The data type for the initial state. Required if either of the initial states are not provided.sequence_length
: (optional) An int32/int64 vector, size[batch_size]
, containing the actual lengths for each of the sequences.scope
: VariableScope for the created subgraph; defaults to "BiRNN"
Returns:
A tuple (outputs, output_state_fw, output_state_bw) where:
outputs is a length T
list of outputs (one for each input), which
are depthconcatenated forward and backward outputs.
output_state_fw is the final state of the forward rnn.
output_state_bw is the final state of the backward rnn.
Raises:
TypeError
: Ifcell_fw
orcell_bw
is not an instance ofRNNCell
.ValueError
: If inputs is None or an empty list.
tf.nn.raw_rnn(cell, loop_fn, parallel_iterations=None, swap_memory=False, scope=None)
Creates an RNN
specified by RNNCell cell
and loop function loop_fn
.
NOTE: This method is still in testing, and the API may change.
This function is a more primitive version of dynamic_rnn
that provides
more direct access to the inputs each iteration. It also provides more
control over when to start and finish reading the sequence, and
what to emit for the output.
For example, it can be used to implement the dynamic decoder of a seq2seq model.
Instead of working with Tensor
objects, most operations work with
TensorArray
objects directly.
The operation of raw_rnn
, in pseudocode, is basically the following:
time = tf.constant(0, dtype=tf.int32)
(finished, next_input, initial_state, _, loop_state) = loop_fn(
time=time, cell_output=None, cell_state=None, loop_state=None)
emit_ta = TensorArray(dynamic_size=True, dtype=initial_state.dtype)
state = initial_state
while not all(finished):
(output, cell_state) = cell(next_input, state)
(next_finished, next_input, next_state, emit, loop_state) = loop_fn(
time=time + 1, cell_output=output, cell_state=cell_state,
loop_state=loop_state)
# Emit zeros and copy forward state for minibatch entries that are finished.
state = tf.select(finished, state, next_state)
emit = tf.select(finished, tf.zeros_like(emit), emit)
emit_ta = emit_ta.write(time, emit)
# If any new minibatch entries are marked as finished, mark these
finished = tf.logical_or(finished, next_finished)
time += 1
return (emit_ta, state, loop_state)
with the additional properties that output and state may be (possibly nested)
tuples, as determined by cell.output_size
and cell.state_size
, and
as a result the final state
and emit_ta
may themselves be tuples.
A simple implementation of dynamic_rnn
via raw_rnn
looks like this:
inputs = tf.placeholder(shape=(max_time, batch_size, input_depth),
dtype=tf.float32)
sequence_length = tf.placeholder(shape=(batch_size,), dtype=tf.int32)
inputs_ta = tf.TensorArray(dtype=tf.float32, size=max_time)
inputs_ta = inputs_ta.unpack(inputs)
cell = tf.nn.rnn_cell.LSTMCell(num_units)
def loop_fn(time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None for time == 0
if cell_output is None: # time == 0
next_cell_state = cell.zero_state(batch_size, tf.float32)
else:
next_cell_state = cell_state
elements_finished = (time >= sequence_length)
finished = tf.reduce_all(elements_finished)
next_input = tf.cond(
finished,
lambda: tf.zeros([batch_size, input_depth], dtype=tf.float32),
lambda: inputs_ta.read(time))
next_loop_state = None
return (elements_finished, next_input, next_cell_state,
emit_output, next_loop_state)
outputs_ta, final_state, _ = raw_rnn(cell, loop_fn)
outputs = outputs_ta.pack()
Args:
cell
: An instance of RNNCell.
loop_fn
: A callable that takes inputs(time, cell_output, cell_state, loop_state)
and returns the tuple(finished, next_input, next_cell_state, emit_output, next_loop_state)
. Heretime
is an int32 scalarTensor
,cell_output
is aTensor
or (possibly nested) tuple of tensors as determined bycell.output_size
, andcell_state
is aTensor
or (possibly nested) tuple of tensors, as determined by theloop_fn
on its first call (and should matchcell.state_size
). The outputs are:finished
, a booleanTensor
of shape[batch_size]
,next_input
: the next input to feed tocell
,next_cell_state
: the next state to feed tocell
, andemit_output
: the output to store for this iteration.Note that
emit_output
should be aTensor
or (possibly nested) tuple of tensors with shapes and structure matchingcell.output_size
andcell_output
above. The parametercell_state
and outputnext_cell_state
may be either a single or (possibly nested) tuple of tensors. The parameterloop_state
and outputnext_loop_state
may be either a single or (possibly nested) tuple ofTensor
andTensorArray
objects. This last parameter may be ignored byloop_fn
and the return value may beNone
. If it is notNone
, then theloop_state
will be propagated through the RNN loop, for use purely byloop_fn
to keep track of its own state. Thenext_loop_state
parameter returned may beNone
.The first call to
loop_fn
will betime = 0
,cell_output = None
,cell_state = None
, andloop_state = None
. For this call: Thenext_cell_state
value should be the value with which to initialize the cell's state. It may be a final state from a previous RNN or it may be the output ofcell.zero_state()
. It should be a (possibly nested) tuple structure of tensors. Ifcell.state_size
is an integer, this must be aTensor
of appropriate type and shape[batch_size, cell.state_size]
. Ifcell.state_size
is aTensorShape
, this must be aTensor
of appropriate type and shape[batch_size] + cell.state_size
. Ifcell.state_size
is a (possibly nested) tuple of ints orTensorShape
, this will be a tuple having the corresponding shapes. Theemit_output
value may be eitherNone
or a (possibly nested) tuple structure of tensors, e.g.,(tf.zeros(shape_0, dtype=dtype_0), tf.zeros(shape_1, dtype=dtype_1))
. If this firstemit_output
return value isNone
, then theemit_ta
result ofraw_rnn
will have the same structure and dtypes ascell.output_size
. Otherwiseemit_ta
will have the same structure, shapes (prepended with abatch_size
dimension), and dtypes asemit_output
. The actual values returned foremit_output
at this initializing call are ignored. Note, this emit structure must be consistent across all time steps. 
parallel_iterations
: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer. swap_memory
: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.scope
: VariableScope for the created subgraph; defaults to "RNN".
Returns:
A tuple (emit_ta, final_state, final_loop_state)
where:
emit_ta
: The RNN output TensorArray
.
If loop_fn
returns a (possibly nested) set of Tensors for
emit_output
during initialization, (inputs time = 0
,
cell_output = None
, and loop_state = None
), then emit_ta
will
have the same structure, dtypes, and shapes as emit_output
instead.
If loop_fn
returns emit_output = None
during this call,
the structure of cell.output_size
is used:
If cell.output_size
is a (possibly nested) tuple of integers
or TensorShape
objects, then emit_ta
will be a tuple having the
same structure as cell.output_size
, containing TensorArrays whose
elements' shapes correspond to the shape data in cell.output_size
.
final_state
: The final cell state. If cell.state_size
is an int, this
will be shaped [batch_size, cell.state_size]
. If it is a
TensorShape
, this will be shaped [batch_size] + cell.state_size
.
If it is a (possibly nested) tuple of ints or TensorShape
, this will
be a tuple having the corresponding shapes.
final_loop_state
: The final loop state as returned by loop_fn
.
Raises:
TypeError
: Ifcell
is not an instance of RNNCell, orloop_fn
is not acallable
.