|View source on GitHub|
TransformerScaffold for packing optimization to stride over inputs.
tfm.nlp.layers.StridedTransformerScaffold( num_attention_heads, inner_dim=768, inner_activation=
tfm.utils.activations.gelu, attention_cls=attention.MultiHeadAttention, attention_cfg=None, feedforward_cls=None, feedforward_cfg=None, dropout_rate=0.0, attention_dropout_rate=0.0, norm_first=False, norm_epsilon=1e-12, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs )
call( inputs, stride: tf.Tensor, training=None )
This is where the layer's logic lives.
call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested
__init__(), or in the
build() method that is
called automatically before
call() executes for the first time.
Input tensor, or dict/list/tuple of input tensors.
The first positional
||Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
|A tensor or list/tuple of tensors.|