XLNet-based pretrainer.

This is an implementation of the network structure surrounding a Transformer-XL encoder as described in "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237).

network An XLNet/Transformer-XL based network. This network should output a sequence output and list of state tensors.
mlm_activation The activation (if any) to use in the Masked LM network. If None, then no activation will be used.
mlm_initializer The initializer (if any) to use in the masked LM. Defaults to a Glorot uniform initializer.




Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

inputs A tensor or list of tensors.
training Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask A mask or list of masks. A mask can be either a tensor or None (no mask).

A tensor if there is a single output, or a list of tensors if there are more than one outputs.