Have a question? Connect with the community at the TensorFlow Forum Visit Forum


Transformer encoder.

Transformer encoder is made up of N identical layers. Each layer is composed of the sublayers:

  1. Self-attention layer
  2. Feedforward network (which is 2 fully-connected layers)

num_layers Number of layers.
num_attention_heads Number of attention heads.
intermediate_size Size of the intermediate (Feedforward) layer.
activation Activation for the intermediate layer.
dropout_rate Dropout probability.
attention_dropout_rate Dropout probability for attention layers.
use_bias Whether to enable use_bias in attention layer. If set False, use_bias in attention layer is disabled.
norm_first Whether to normalize inputs to attention and intermediate dense layers. If set False, output of attention and intermediate dense layers is normalized.
norm_epsilon Epsilon value to initialize normalization layers.
intermediate_dropout Dropout probability for intermediate_dropout_layer.



View source

Return the output of the encoder.

encoder_inputs tensor with shape [batch_size, input_length, hidden_size]
attention_mask mask for the encoder self-attention layer. [batch_size, input_length, input_length]

Output of encoder. float32 tensor with shape [batch_size, input_length, hidden_size]