This is an implementation of the network structure surrounding a
Transformer-XL encoder as described in "XLNet: Generalized Autoregressive
Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237).
Args
network
A transformer network. This network should output a sequence output
and a classification output. Furthermore, it should expose its embedding
table via a "get_embedding_table" method.
start_n_top
Beam size for span start.
end_n_top
Beam size for span end.
dropout_rate
The dropout rate for the span labeling layer.
span_labeling_activation
The activation for the span labeling head.
initializer
The initializer (if any) to use in the span labeling network.
Defaults to a Glorot uniform initializer.