Funnel Transformer-based encoder network.
tfm.nlp.networks.FunnelTransformerEncoder(
vocab_size: int,
hidden_size: int = 768,
num_layers: int = 12,
num_attention_heads: int = 12,
max_sequence_length: int = 512,
type_vocab_size: int = 16,
inner_dim: int = 3072,
inner_activation: _Activation = _approx_gelu,
output_dropout: float = 0.1,
attention_dropout: float = 0.1,
pool_type: str = _MAX,
pool_stride: Union[int, Sequence[Union[int, float]]] = 2,
unpool_length: int = 0,
initializer: _Initializer = tf.keras.initializers.TruncatedNormal(stddev=0.02),
output_range: Optional[int] = None,
embedding_width: Optional[int] = None,
embedding_layer: Optional[tf.keras.layers.Layer] = None,
norm_first: bool = False,
transformer_cls: Union[str, tf.keras.layers.Layer] = tfm.nlp.layers.TransformerEncoderBlock
,
share_rezero: bool = False,
append_dense_inputs: bool = False,
**kwargs
)
Funnel Transformer Implementation of https://arxiv.org/abs/2006.03236
This implementation utilizes the base framework with Bert
(https://arxiv.org/abs/1810.04805).
Its output is compatible with BertEncoder
.
Args |
vocab_size
|
The size of the token vocabulary.
|
hidden_size
|
The size of the transformer hidden layers.
|
num_layers
|
The number of transformer layers.
|
num_attention_heads
|
The number of attention heads for each transformer. The
hidden size must be divisible by the number of attention heads.
|
max_sequence_length
|
The maximum sequence length that this encoder can
consume. If None, max_sequence_length uses the value from sequence length.
This determines the variable shape for positional embeddings.
|
type_vocab_size
|
The number of types that the 'type_ids' input can take.
|
inner_dim
|
The output dimension of the first Dense layer in a two-layer
feedforward network for each transformer.
|
inner_activation
|
The activation for the first Dense layer in a two-layer
feedforward network for each transformer.
|
output_dropout
|
Dropout probability for the post-attention and output
dropout.
|
attention_dropout
|
The dropout rate to use for the attention layers within
the transformer layers.
|
pool_type
|
Pooling type. Choose from ['max', 'avg', 'truncated_avg'].
|
pool_stride
|
An int or a list of ints. Pooling stride(s) to compress the
sequence length. If set to int, each layer will have the same stride size.
If set to list, the number of elements needs to match num_layers.
|
unpool_length
|
Leading n tokens to be skipped from pooling.
|
initializer
|
The initialzer to use for all weights in this encoder.
|
output_range
|
The sequence output range, [0, output_range), by slicing the
target sequence of the last transformer layer. None means the entire
target sequence will attend to the source sequence, which yields the full
output.
|
embedding_width
|
The width of the word embeddings. If the embedding width is
not equal to hidden size, embedding parameters will be factorized into two
matrices in the shape of ['vocab_size', 'embedding_width'] and
'embedding_width', 'hidden_size'.
|
embedding_layer
|
An optional Layer instance which will be called to generate
embeddings for the input word IDs.
|
norm_first
|
Whether to normalize inputs to attention and intermediate dense
layers. If set False, output of attention and intermediate dense layers is
normalized. This does not apply to ReZero.
|
transformer_cls
|
str or a keras Layer. This is the base TransformerBlock the
funnel encoder relies on.
|
share_rezero
|
bool. Whether to share ReZero alpha between the attention
layer and the ffn layer. This option is specific to ReZero.
|
with_dense_inputs
|
Whether to accept dense embeddings as the input.
|
Attributes |
pooler_layer
|
The pooler dense layer after the transformer layers.
|
transformer_layers
|
List of Transformer layers in the encoder.
|
Methods
call
View source
call(
inputs, output_range: Optional[tf.Tensor] = None
)
This is where the layer's logic lives.
The call()
method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()
). It is recommended to create state, including
tf.Variable
instances and nested Layer
instances,
in __init__()
, or in the build()
method that is
called automatically before call()
executes for the first time.
Args |
inputs
|
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
- NumPy array or Python scalar values in
inputs get cast as
tensors.
- Keras mask metadata is only collected from
inputs .
- Layers are built (
build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs .
- Mixed precision input casting is only applied to
inputs .
If a layer has tensor arguments in *args or **kwargs , their
casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using
inputs
only.
- Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for
inputs and not for tensors in
positional and keyword arguments.
|
*args
|
Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
|
**kwargs
|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
training : Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask : Boolean input mask. If the layer's call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
|
Returns |
A tensor or list/tuple of tensors.
|
get_embedding_layer
View source
get_embedding_layer()
get_embedding_table
View source
get_embedding_table()