View source on GitHub

Group LSTM cell (G-LSTM).

Inherits From: RNNCell

The implementation is based on:

O. Kuchaiev and B. Ginsburg "Factorization Tricks for LSTM Networks", ICLR 2017 workshop.

In brief, a G-LSTM cell consists of one LSTM sub-cell per group, where each sub-cell operates on an evenly-sized sub-vector of the input and produces an evenly-sized sub-vector of the output. For example, a G-LSTM cell with 128 units and 4 groups consists of 4 LSTMs sub-cells with 32 units each. If that G-LSTM cell is fed a 200-dim input, then each sub-cell receives a 50-dim part of the input and produces a 32-dim part of the output.

num_units int, The number of units in the G-LSTM cell
initializer (optional) The initializer to use for the weight and projection matrices.
num_proj (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.
number_of_groups (optional) int, number of groups to use. If number_of_groups is 1, then it should be equivalent to LSTM cell
forget_bias Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.
activation Activation function of the inner states.
reuse (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.

ValueError If num_units or num_proj is not divisible by number_of_groups.


output_size Integer or TensorShape: size of outputs produced by this cell.

state_size size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.



View source


View source

Return zero-filled state tensor(s).

batch_size int, float, or unit Tensor representing the batch size.
dtype the data type to use for the state.

If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.