![]() |
Transformer decoder.
tfm.nlp.models.TransformerDecoder(
num_layers=6,
num_attention_heads=8,
intermediate_size=2048,
activation='relu',
dropout_rate=0.0,
attention_dropout_rate=0.0,
use_bias=False,
norm_first=True,
norm_epsilon=1e-06,
intermediate_dropout=0.0,
**kwargs
)
Like the encoder, the decoder is made up of N identical layers. Each layer is composed of the sublayers:
- Self-attention layer
- Multi-headed attention layer combining encoder outputs with results from the previous self-attention layer.
- Feedforward network (2 fully-connected layers)
Methods
call
call(
target,
memory,
self_attention_mask=None,
cross_attention_mask=None,
cache=None,
decode_loop_step=None,
return_all_decoder_outputs=False
)
Return the output of the decoder layer stacks.
Args | |
---|---|
target
|
A tensor with shape (batch_size, target_length, hidden_size) .
|
memory
|
A tensor with shape (batch_size, input_length, hidden_size) .
|
self_attention_mask
|
A tensor with shape (batch_size, target_len,
target_length) , the mask for decoder self-attention layer.
|
cross_attention_mask
|
A tensor with shape (batch_size, target_length,
input_length) which is the mask for encoder-decoder attention layer.
|
cache
|
(Used for fast decoding) A nested dictionary storing previous
decoder self-attention values. The items are:
{layer_n: {"k": A tensor with shape (batch_size, i, key_channels) ,
"v": A tensor with shape (batch_size, i, value_channels) },
...}
|
decode_loop_step
|
An integer, the step number of the decoding loop. Used only for autoregressive inference on TPU. |
return_all_decoder_outputs
|
Return all decoder layer outputs. Note that the outputs are layer normed. This is useful when introducing per layer auxiliary loss. |
Returns | |
---|---|
Output of decoder.
float32 tensor with shape (batch_size, target_length, hidden_size ).
|