![]() |
Base XLNet model.
tfm.nlp.networks.XLNetBase(
vocab_size,
num_layers,
hidden_size,
num_attention_heads,
head_size,
inner_size,
dropout_rate,
attention_dropout_rate,
attention_type,
bi_data,
initializer,
two_stream=False,
tie_attention_biases=True,
memory_length=None,
clamp_length=-1,
reuse_length=None,
inner_activation='relu',
use_cls_mask=False,
embedding_width=None,
**kwargs
)
Attributes | |
---|---|
vocab_size
|
int, the number of tokens in vocabulary. |
num_layers
|
int, the number of layers. |
hidden_size
|
int, the hidden size. |
num_attention_heads
|
int, the number of attention heads. |
head_size
|
int, the dimension size of each attention head. |
inner_size
|
int, the hidden size in feed-forward layers. |
dropout_rate
|
float, dropout rate. |
attention_dropout_rate
|
float, dropout rate on attention probabilities. |
attention_type
|
str, "uni" or "bi". |
bi_data
|
bool, whether to use bidirectional input pipeline. Usually set to True during pretraining and False during finetuning. |
initializer
|
A tf initializer. |
two_stream
|
bool, whether or not to use TwoStreamRelativeAttention used
in the XLNet pretrainer. If False , then it will use
MultiHeadRelativeAttention as in Transformer XL.
|
tie_attention_biases
|
bool, whether or not to tie the biases together.
Usually set to True . Used for backwards compatibility.
|
memory_length
|
int, the number of tokens to cache. |
same_length
|
bool, whether to use the same attention length for each token. |
clamp_length
|
int, clamp all relative distances larger than clamp_length. -1 means no clamping. |
reuse_length
|
int, the number of tokens in the currect batch to be cached and reused in the future. |
inner_activation
|
str, "relu" or "gelu". |
use_cls_mask
|
bool, whether or not cls mask is included in the input sequences. |
embedding_width
|
The width of the word embeddings. If the embedding width is not equal to hidden size, embedding parameters will be factorized into two matrices in the shape of ["vocab_size", "embedding_width"] and "embedding_width", "hidden_size". |
embedding_layer
|
The word embedding layer. None means we will create a
new embedding layer. Otherwise, we will reuse the given embedding layer.
This parameter is originally added for ELECTRA model which needs to tie
the generator embeddings with the discriminator embeddings.
|
Methods
call
call(
inputs
)
Implements call() for the layer.
get_embedding_lookup_table
get_embedding_lookup_table()
Returns the embedding layer weights.