![]() |
Single transformer layer for decoder.
tfm.nlp.layers.TransformerDecoderBlock(
num_attention_heads,
intermediate_size,
intermediate_activation,
dropout_rate=0.0,
attention_dropout_rate=0.0,
multi_channel_cross_attention=False,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
use_bias=True,
norm_first=False,
norm_epsilon=1e-12,
intermediate_dropout=0.0,
attention_initializer=None,
**kwargs
)
It has three sub-layers: (1) a multi-head self-attention mechanism. (2) a encoder-decoder attention. (3) a positionwise fully connected feed-forward network.
Args | |
---|---|
num_attention_heads
|
Number of attention heads. |
intermediate_size
|
Size of the intermediate layer. |
intermediate_activation
|
Activation for the intermediate layer. |
dropout_rate
|
Dropout probability for the post-attention and output dropout. |
attention_dropout_rate
|
Dropout probability for within the attention layer. |
multi_channel_cross_attention
|
Whether to use MultiChannelAttention for
cross-attention between target sequences and source sequences.
|
kernel_initializer
|
Initializer for dense layer kernels. |
bias_initializer
|
Initializer for dense layer biases. |
kernel_regularizer
|
Regularizer for dense layer kernels. |
bias_regularizer
|
Regularizer for dense layer biases. |
activity_regularizer
|
Regularizer for dense layer activity. |
kernel_constraint
|
Constraint for dense layer kernels. |
bias_constraint
|
Constraint for dense layer kernels. |
use_bias
|
Whether to enable use_bias in attention layer. If set False, use_bias in attention layer is disabled. |
norm_first
|
Whether to normalize inputs to attention and intermediate dense layers. If set False, output of attention and intermediate dense layers is normalized. |
norm_epsilon
|
Epsilon value to initialize normalization layers. |
intermediate_dropout
|
Dropout probability for intermediate_dropout_layer. |
attention_initializer
|
Initializer for kernels of attention layers. If set
None , attention layers use kernel_initializer as initializer for kernel.
|
Methods
call
call(
inputs, cache=None, decode_loop_step=None
)
This is where the layer's logic lives.
The call()
method may not create state (except in its first invocation,
wrapping the creation of variables or other resources in tf.init_scope()
).
It is recommended to create state in __init__()
, or the build()
method
that is called automatically before call()
executes the first time.
Args | |
---|---|
inputs
|
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
|
*args
|
Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above. |
**kwargs
|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
training : Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.mask : Boolean input mask. If the layer's call() method takes a
mask argument, its default value will be set to the mask generated
for inputs by the previous layer (if input did come from a layer
that generated a corresponding mask, i.e. if it came from a Keras
layer with masking support).
|
Returns | |
---|---|
A tensor or list/tuple of tensors. |
common_layers_with_encoder
common_layers_with_encoder()
Gets layer objects that can make a Transformer encoder block.