Have a question? Connect with the community at the TensorFlow Forum Visit Forum


A Keras functional API implementation for MobileBERT encoder.

word_vocab_size Number of words in the vocabulary.
word_embed_size Word embedding size.
type_vocab_size Number of word types.
max_sequence_length Maximum length of input sequence.
num_blocks Number of transformer block in the encoder model.
hidden_size Hidden size for the transformer block.
num_attention_heads Number of attention heads in the transformer block.
intermediate_size The size of the "intermediate" (a.k.a., feed forward) layer.
intermediate_act_fn The non-linear activation function to apply to the output of the intermediate/feed-forward layer.
hidden_dropout_prob Dropout probability for the hidden layers.
attention_probs_dropout_prob Dropout probability of the attention probabilities.
intra_bottleneck_size Size of bottleneck.
initializer_range The stddev of the truncated_normal_initializer for initializing all weight matrices.
use_bottleneck_attention Use attention inputs from the bottleneck transformation. If true, the following key_query_shared_bottleneck will be ignored.
key_query_shared_bottleneck Whether to share linear transformation for keys and queries.
num_feedforward_networks Number of stacked feed-forward networks.
normalization_type The type of normalization_type, only 'no_norm' and 'layer_norm' are supported. 'no_norm' represents the element-wise linear transformation for the student model, as suggested by the original MobileBERT paper. 'layer_norm' is used for the teacher model.
classifier_activation If using the tanh activation for the final representation of the [CLS] token in fine-tuning.
**kwargs Other keyworded and arguments.

pooler_layer The pooler dense layer after the transformer layers.
transformer_layers List of Transformer layers in the encoder.



Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

inputs A tensor or list of tensors.
training Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask A mask or list of masks. A mask can be either a tensor or None (no mask).

A tensor if there is a single output, or a list of tensors if there are more than one outputs.


View source


View source