BERT pretraining model.

[Note] Please use the new BertPretrainerV2 for your projects.

The BertPretrainer allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives.

network A transformer network. This network should output a sequence output and a classification output.
num_classes Number of classes to predict from the classification network.
num_token_predictions Number of tokens to predict from the masked LM.
embedding_table Embedding table of a network. If None, the "network.get_embedding_table()" is used.
activation The activation (if any) to use in the masked LM network. If None, no activation will be used.
initializer The initializer (if any) to use in the masked LM and classification networks. Defaults to a Glorot uniform initializer.
output The output style for this network. Can be either logits or predictions.



Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

inputs Input tensor, or dict/list/tuple of input tensors.
training Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask A mask or list of masks. A mask can be either a boolean tensor or None (no mask). For more details, check the guide here.

A tensor if there is a single output, or a list of tensors if there are more than one outputs.