Transformer encoder.
tfm.nlp.models.TransformerEncoder(
num_layers=6,
num_attention_heads=8,
intermediate_size=2048,
activation='relu',
dropout_rate=0.0,
attention_dropout_rate=0.0,
use_bias=False,
norm_first=True,
norm_epsilon=1e-06,
intermediate_dropout=0.0,
**kwargs
)
Transformer encoder is made up of N identical layers. Each layer is composed
of the sublayers:
- Self-attention layer
- Feedforward network (which is 2 fully-connected layers)
Args |
num_layers
|
Number of layers.
|
num_attention_heads
|
Number of attention heads.
|
intermediate_size
|
Size of the intermediate (Feedforward) layer.
|
activation
|
Activation for the intermediate layer.
|
dropout_rate
|
Dropout probability.
|
attention_dropout_rate
|
Dropout probability for attention layers.
|
use_bias
|
Whether to enable use_bias in attention layer. If set False,
use_bias in attention layer is disabled.
|
norm_first
|
Whether to normalize inputs to attention and intermediate
dense layers. If set False, output of attention and intermediate dense
layers is normalized.
|
norm_epsilon
|
Epsilon value to initialize normalization layers.
|
intermediate_dropout
|
Dropout probability for intermediate_dropout_layer.
|
**kwargs
|
key word arguemnts passed to tf.keras.layers.Layer.
|
Methods
call
View source
call(
encoder_inputs, attention_mask=None
)
Return the output of the encoder.
Args |
encoder_inputs
|
A tensor with shape (batch_size, input_length,
hidden_size) .
|
attention_mask
|
A mask for the encoder self-attention layer with shape
(batch_size, input_length, input_length) .
|
Returns |
Output of encoder which is a float32 tensor with shape
(batch_size, input_length, hidden_size) .
|