![]() |
Transformer scaffold layer.
tfm.nlp.layers.TransformerScaffold(
num_attention_heads,
intermediate_size,
intermediate_activation,
attention_cls=attention.MultiHeadAttention,
attention_cfg=None,
feedforward_cls=None,
feedforward_cfg=None,
dropout_rate=0.0,
attention_dropout_rate=0.0,
norm_first=False,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
This layer implements the Transformer from "Attention Is All You Need".
(https://arxiv.org/abs/1706.03762), with a customizable attention layer and
feedforward layer option. Users can pass a class to
attention_cls
/feedforward_cls
and associated config to
attention_cfg
/feedforward_cfg
, in which case the scaffold will
instantiate the class with the config, or pass a class instance to
attention_cls
/feedforward_cls
.
Args | |
---|---|
num_attention_heads
|
Number of attention heads. |
intermediate_size
|
Size of the intermediate layer. |
intermediate_activation
|
Activation for the intermediate layer. |
attention_cls
|
A class to instantiate attention layer, or a layer instance. |
attention_cfg
|
The config with which to instantiate attention_cls . Ignored
if attention_cls is a layer instance or None. If attention_cls is a
class, but attention_cfg is None, following kwargs will be used to
instantiate the attention instance: {
"num_heads": num_attention_heads,
"key_dim": int(hidden_size // num_attention_heads),
"dropout": attention_dropout_rate,
"name": "self_attention" }, where hidden_size is the input tensor's
last dimension.
|
feedforward_cls
|
A class to instantiate feedforward layer, or a layer instance. If None, will use the standard feedforward layer as described in "Attention Is All You Need" paper. If not None, the instantiated feedforward layer is expected to take the output of attention as input and its output is this transformer layer's output. |
feedforward_cfg
|
The config with which to instantiate feedforward_cls .
Ignored if feedforward_cls is a layer instance or is None. If
feedforward_cls is a class, but feedforward_cfg is None, following
kwargs will be used to instantiate the feedforward instance: {
"intermediate_size": intermediate_size,
"intermediate_activation": intermediate_activation,
"dropout": dropout_rate,
"name": "feedforward" }.
|
dropout_rate
|
Dropout probability for the post-attention and output dropout. |
attention_dropout_rate
|
Dropout probability for within the attention layer. |
kernel_initializer
|
Initializer for dense layer kernels. |
bias_initializer
|
Initializer for dense layer biases. |
kernel_regularizer
|
Regularizer for dense layer kernels. |
bias_regularizer
|
Regularizer for dense layer biases. |
activity_regularizer
|
Regularizer for dense layer activity. |
kernel_constraint
|
Constraint for dense layer kernels. |
bias_constraint
|
Constraint for dense layer kernels. |
Methods
call
call(
inputs, training=None
)
This is where the layer's logic lives.
The call()
method may not create state (except in its first invocation,
wrapping the creation of variables or other resources in tf.init_scope()
).
It is recommended to create state in __init__()
, or the build()
method
that is called automatically before call()
executes the first time.
Args | |
---|---|
inputs
|
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
|
*args
|
Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above. |
**kwargs
|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
training : Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.mask : Boolean input mask. If the layer's call() method takes a
mask argument, its default value will be set to the mask generated
for inputs by the previous layer (if input did come from a layer
that generated a corresponding mask, i.e. if it came from a Keras
layer with masking support).
|
Returns | |
---|---|
A tensor or list/tuple of tensors. |