![]() |
Transformer layer.
Inherits From: Transformer
, TransformerEncoderBlock
tfm.nlp.layers.CompiledTransformer(
num_attention_heads,
intermediate_size,
intermediate_activation,
dropout_rate=0.0,
attention_dropout_rate=0.0,
output_range=None,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
use_bias=True,
norm_first=False,
norm_epsilon=1e-12,
intermediate_dropout=0.0,
attention_initializer=None,
**kwargs
)
This layer implements the Transformer from "Attention Is All You Need". (https://arxiv.org/abs/1706.03762).
Args | |
---|---|
num_attention_heads
|
Number of attention heads. |
intermediate_size
|
Size of the intermediate layer. |
intermediate_activation
|
Activation for the intermediate layer. |
dropout_rate
|
Dropout probability for the post-attention and output dropout. |
attention_dropout_rate
|
Dropout probability for within the attention layer. |
output_range
|
the sequence output range, [0, output_range) by slicing the
target sequence. None means the target sequence is not sliced.
|
kernel_initializer
|
Initializer for dense layer kernels. |
bias_initializer
|
Initializer for dense layer biases. |
kernel_regularizer
|
Regularizer for dense layer kernels. |
bias_regularizer
|
Regularizer for dense layer biases. |
activity_regularizer
|
Regularizer for dense layer activity. |
kernel_constraint
|
Constraint for dense layer kernels. |
bias_constraint
|
Constraint for dense layer kernels. |
use_bias
|
Whether to enable use_bias in attention layer. If set False, use_bias in attention layer is disabled. |
norm_first
|
Whether to normalize inputs to attention and intermediate dense layers. If set False, output of attention and intermediate dense layers is normalized. |
norm_epsilon
|
Epsilon value to initialize normalization layers. |
intermediate_dropout
|
Dropout probability for intermediate_dropout_layer. |
attention_initializer
|
Initializer for kernels of attention layers. If set
None , attention layers use kernel_initializer as initializer for kernel.
|
Methods
call
call(
inputs
)
Transformer self-attention encoder block call.
Args | |
---|---|
inputs
|
a single tensor or a list of tensors.
input tensor as the single sequence of embeddings.
[input tensor , attention mask ] to have the additional attention
mask.
[query tensor , key value tensor , attention mask ] to have separate
input streams for the query, and key/value to the multi-head
attention.
|
Returns | |
---|---|
An output tensor with the same dimensions as input/query tensor. |