Packs tokens into model inputs for BERT.
tfm.nlp.layers.BertPackInputs(
seq_length,
*,
start_of_sequence_id=None,
end_of_segment_id=None,
padding_id=None,
special_tokens_dict=None,
truncator='round_robin',
**kwargs
)
Args |
seq_length
|
The desired output length. Must not exceed the max_seq_length
that was fixed at training time for the BERT model receiving the inputs.
|
start_of_sequence_id
|
The numeric id of the token that is to be placed
at the start of each sequence (called "[CLS]" for BERT).
|
end_of_segment_id
|
The numeric id of the token that is to be placed
at the end of each input segment (called "[SEP]" for BERT).
|
padding_id
|
The numeric id of the token that is to be placed into the
unused positions after the last segment in the sequence
(called "[PAD]" for BERT).
|
special_tokens_dict
|
Optionally, a dict from Python strings to Python
integers that contains values for start_of_sequence_id ,
end_of_segment_id and padding_id . (Further values in the dict are
silenty ignored.) If this is passed, separate _id arguments must be
omitted.
|
truncator
|
The algorithm to truncate a list of batched segments to fit a
per-example length limit. The value can be either round_robin or
waterfall :
(1) For "round_robin" algorithm, available space is assigned
one token at a time in a round-robin fashion to the inputs that still
need some, until the limit is reached. It currently only supports
one or two segments.
(2) For "waterfall" algorithm, the allocation of the budget is done
using a "waterfall" algorithm that allocates quota in a
left-to-right manner and fills up the buckets until we run out of
budget. It support arbitrary number of segments.
|
**kwargs <a id="*kwargs">
|
standard arguments to Layer() .
|
Raises |
ImportError
|
if importing tensorflow_text failed.
|
Methods
View source
@staticmethod
bert_pack_inputs(
inputs: Union[tf.RaggedTensor, List[tf.RaggedTensor]],
seq_length: Union[int, tf.Tensor],
start_of_sequence_id: Union[int, tf.Tensor],
end_of_segment_id: Union[int, tf.Tensor],
padding_id: Union[int, tf.Tensor],
truncator='round_robin'
)
Freestanding equivalent of the BertPackInputs layer.
call
View source
call(
inputs: Union[tf.RaggedTensor, List[tf.RaggedTensor]]
)
Adds special tokens to pack a list of segments into BERT input Tensors.
Args |
inputs
|
A Python list of one or two RaggedTensors, each with the batched
values one input segment. The j-th segment of the i-th input example
consists of slice inputs[j][i, ...] .
|
Returns |
A nest of Tensors for use as input to the BERT TransformerEncoder.
|