Have a question? Connect with the community at the TensorFlow Forum Visit Forum


Gated linear feedforward layer.

This layer follows the paper "GLU Variants Improve Transformer" (https://arxiv.org/abs/2002.05202). In additional, it allows to stack multiple feedforward blocks and specify the position of dropout layer.

intermediate_size Size of the intermediate layer.
intermediate_activation Activation for the intermediate layer.
dropout Dropout probability for the output dropout.
use_gate Whether to use gated linear units. If True, assuming GELU as the activation and omitting bias, will apply `GEGLU(x, W, V, W_2) = (GEGLU(xW)

  • xV)W2; if False, will follow "Attention Is All You Need" (<a href="https://arxiv.org/abs/1706.03762">https://arxiv.org/abs/1706.03762</a>) paper and applyFFN(x, W, W_2) = GELU(xW_1)W_2.</td> </tr><tr> <td>num_blocks</td> <td> The number of feedforward blocks to stack. Each block contains a (gated) linear layer and a fully connected layer followed by dropout, layer norm and residual. </td> </tr><tr> <td>dropout_position</td> <td> Where to apply the dropout, the value can be eitherbefore_residualorafter_residual. Ifbefore_residual, will applylayer_output = layer_norm(dropout(layer_output) + layer_input); ifafter residual, will applylayer_output = dropout(layer_norm(layer_output + layer_input)). </td> </tr><tr> <td>kernel_initializer</td> <td> Initializer for dense layer kernels. </td> </tr><tr> <td>bias_initializer</td> <td> Initializer for dense layer biases. </td> </tr><tr> <td>kernel_regularizer</td> <td> Regularizer for dense layer kernels. </td> </tr><tr> <td>bias_regularizer</td> <td> Regularizer for dense layer biases. </td> </tr><tr> <td>activity_regularizer</td> <td> Regularizer for dense layer activity. </td> </tr><tr> <td>kernel_constraint</td> <td> Constraint for dense layer kernels. </td> </tr><tr> <td>bias_constraint`
Constraint for dense layer kernels.



View source

This is where the layer's logic lives.

Note here that call() method in tf.keras is little bit different from keras API. In keras API, you can pass support masking for layers as additional arguments. Whereas tf.keras has compute_mask() method to support masking.

inputs Input tensor, or list/tuple of input tensors.
*args Additional positional arguments. Currently unused.
**kwargs Additional keyword arguments. Currently unused.

A tensor or list/tuple of tensors.