View source on GitHub |
Applies dynamic language model masking.
text.mask_language_model(
input_ids, item_selector, mask_values_chooser, axis=1
)
Used in the notebooks
Used in the guide |
---|
mask_language_model
implements the Masked LM and Masking Procedure
described in BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding
(https://arxiv.org/pdf/1810.04805.pdf).
mask_language_model
uses an ItemSelector
to select the items for masking,
and a MaskValuesChooser
to assign the values to the selected items.
The purpose of this is to bias the representation towards the actual
observed item.
Masking is performed on items in an axis. A decision is taken independently at random to mask with [MASK], mask with random tokens from the full vocab, or not mask at all. Note that the masking decision is broadcasted to the sub-dimensions.
For example, in a RaggedTensor of shape [batch, (wordpieces)]
and if axis=1,
each wordpiece independently gets masked (or not).
With the following input:
[[b"Sp", b"##onge", b"bob", b"Sq", b"##uare", b"##pants" ],
[b"Bar", b"##ack", b"Ob", b"##ama"],
[b"Mar", b"##vel", b"A", b"##ven", b"##gers"]],
mask_language_model
could end up masking individual wordpieces:
[[b"[MASK]", b"##onge", b"bob", b"Sq", b"[MASK]", b"##pants" ],
[b"Bar", b"##ack", b"[MASK]", b"##ama"],
[b"[MASK]", b"##vel", b"A", b"##ven", b"##gers"]]
Or with random token inserted:
[[b"[MASK]", b"##onge", b"bob", b"Sq", b"[MASK]", b"##pants" ],
[b"Bar", b"##ack", b"Sq", b"##ama"], # random token inserted for 'Ob'
[b"Bar", b"##vel", b"A", b"##ven", b"##gers"]] # random token inserted for
# 'Mar'
In a RaggedTensor of shape [batch, (words), (wordpieces)]
, whole words get
masked (or not). If a word gets masked, all its tokens are independently
either replaced by [MASK]
, by random tokens, or no substitution occurs.
Note that any arbitrary spans that can be constructed by a RaggedTensor
can
be masked in the same way.
For example, if we have an RaggedTensor
with shape
[batch, (token), (wordpieces)]
:
[[[b"Sp", "##onge"], [b"bob"], [b"Sq", b"##uare", b"##pants"]],
[[b"Bar", "##ack"], [b"Ob", b"##ama"]],
[[b"Mar", "##vel"], [b"A", b"##ven", b"##gers"]]]
mask_language_model
could mask whole spans (items grouped together
by the same 1st dimension):
[[[b"[MASK]", "[MASK]"], [b"bob"], [b"Sq", b"##uare", b"##pants"]],
[[b"Bar", "##ack"], [b"[MASK]", b"[MASK]"]],
[[b"[MASK]", "[MASK]"], [b"A", b"##ven", b"##gers"]]]
or insert random items in spans:
[[[b"Mar", "##ama"], [b"bob"], [b"Sq", b"##uare", b"##pants"]],
[[b"Bar", "##ack"], [b"##onge", b"##gers"]],
[[b"Ob", "Sp"], [b"A", b"##ven", b"##gers"]]]