ALBERT (https://arxiv.org/abs/1810.04805) text encoder network.

This network implements the encoder described in the paper "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations" (https://arxiv.org/abs/1909.11942).

Compared with BERT (https://arxiv.org/abs/1810.04805), ALBERT refactorizes embedding parameters into two smaller matrices and shares parameters across layers.

The default values for this object are taken from the ALBERT-Base implementation described in the paper.

vocab_size The size of the token vocabulary.
embedding_width The width of the word embeddings. If the embedding width is not equal to hidden size, embedding parameters will be factorized into two matrices in the shape of ['vocab_size', 'embedding_width'] and 'embedding_width', 'hidden_size'.
hidden_size The size of the transformer hidden layers.
num_layers The number of transformer layers.
num_attention_heads The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads.
max_sequence_length The maximum sequence length that this encoder can consume. If None, max_sequence_length uses the value from sequence length. This determines the variable shape for positional embeddings.
type_vocab_size The number of types that the 'type_ids' input can take.
intermediate_size The intermediate size for the transformer layers.
activation The activation to use for the transformer layers.
dropout_rate The dropout rate to use for the transformer layers.
attention_dropout_rate The dropout rate to use for the attention layers within the transformer layers.
initializer The initialzer to use for all weights in this encoder.
dict_outputs Whether to use a dictionary as the model outputs.



Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

inputs A tensor or list of tensors.
training Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask A mask or list of masks. A mask can be either a tensor or None (no mask).

A tensor if there is a single output, or a list of tensors if there are more than one outputs.


