![]() |
BigBird, a sparse attention mechanism.
tfm.nlp.layers.BigBirdAttention(
num_rand_blocks=3,
from_block_size=64,
to_block_size=64,
max_rand_mask_length=MAX_SEQ_LEN,
seed=None,
**kwargs
)
This layer follows the paper "Big Bird: Transformers for Longer Sequences" (https://arxiv.org/abs/2007.14062). It reduces this quadratic dependency of attention computation to linear.
Arguments are the same as MultiHeadAttention
layer.
Methods
call
call(
query, value, key=None, attention_mask=None, **kwargs
)
This is where the layer's logic lives.
The call()
method may not create state (except in its first invocation,
wrapping the creation of variables or other resources in tf.init_scope()
).
It is recommended to create state in __init__()
, or the build()
method
that is called automatically before call()
executes the first time.
Args | |
---|---|
inputs
|
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
|
*args
|
Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above. |
**kwargs
|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
training : Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.mask : Boolean input mask. If the layer's call() method takes a
mask argument, its default value will be set to the mask generated
for inputs by the previous layer (if input did come from a layer
that generated a corresponding mask, i.e. if it came from a Keras
layer with masking support).
|
Returns | |
---|---|
A tensor or list/tuple of tensors. |