Transformer encoder.

Transformer encoder is made up of N identical layers. Each layer is composed of the sublayers:

  1. Self-attention layer
  2. Feedforward network (which is 2 fully-connected layers)

num_layers Number of layers.
num_attention_heads Number of attention heads.
intermediate_size Size of the intermediate (Feedforward) layer.
activation Activation for the intermediate layer.
dropout_rate Dropout probability.
attention_dropout_rate Dropout probability for attention layers.
use_bias Whether to enable use_bias in attention layer. If set False, use_bias in attention layer is disabled.
norm_first Whether to normalize inputs to attention and intermediate dense layers. If set False, output of attention and intermediate dense layers is normalized.
norm_epsilon Epsilon value to initialize normalization layers.
intermediate_dropout Dropout probability for intermediate_dropout_layer.
**kwargs key word arguemnts passed to tf.keras.layers.Layer.



View source

Return the output of the encoder.

encoder_inputs A tensor with shape (batch_size, input_length, hidden_size).
attention_mask A mask for the encoder self-attention layer with shape (batch_size, input_length, input_length).

Output of encoder which is a float32 tensor with shape (batch_size, input_length, hidden_size).