![]() |
MobileBERT encoder configuration.
Inherits From: Config
, ParamsDict
tfm.nlp.encoders.MobileBertEncoderConfig(
default_params: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None,
restrictions: dataclasses.InitVar[Optional[List[str]]] = None,
word_vocab_size: int = 30522,
word_embed_size: int = 128,
type_vocab_size: int = 2,
max_sequence_length: int = 512,
num_blocks: int = 24,
hidden_size: int = 512,
num_attention_heads: int = 4,
intermediate_size: int = 4096,
hidden_activation: str = 'gelu',
hidden_dropout_prob: float = 0.1,
attention_probs_dropout_prob: float = 0.1,
intra_bottleneck_size: int = 1024,
initializer_range: float = 0.02,
use_bottleneck_attention: bool = False,
key_query_shared_bottleneck: bool = False,
num_feedforward_networks: int = 1,
normalization_type: str = 'layer_norm',
classifier_activation: bool = True,
input_mask_dtype: str = 'int32'
)
Attributes | |
---|---|
word_vocab_size
|
number of words in the vocabulary. |
word_embed_size
|
word embedding size. |
type_vocab_size
|
number of word types. |
max_sequence_length
|
maximum length of input sequence. |
num_blocks
|
number of transformer block in the encoder model. |
hidden_size
|
the hidden size for the transformer block. |
num_attention_heads
|
number of attention heads in the transformer block. |
intermediate_size
|
the size of the "intermediate" (a.k.a., feed forward) layer. |
hidden_activation
|
the non-linear activation function to apply to the output of the intermediate/feed-forward layer. |
hidden_dropout_prob
|
dropout probability for the hidden layers. |
attention_probs_dropout_prob
|
dropout probability of the attention probabilities. |
intra_bottleneck_size
|
the size of bottleneck. |
initializer_range
|
The stddev of the truncated_normal_initializer for initializing all weight matrices. |
use_bottleneck_attention
|
Use attention inputs from the bottleneck
transformation. If true, the following key_query_shared_bottleneck
will be ignored.
|
key_query_shared_bottleneck
|
whether to share linear transformation for keys and queries. |
num_feedforward_networks
|
number of stacked feed-forward networks. |
normalization_type
|
the type of normalization_type, only 'no_norm' and 'layer_norm' are supported. 'no_norm' represents the element-wise linear transformation for the student model, as suggested by the original MobileBERT paper. 'layer_norm' is used for the teacher model. |
classifier_activation
|
if using the tanh activation for the final representation of the [CLS] token in fine-tuning. |
BUILDER
|
|
default_params
|
Dataclass field |
restrictions
|
Dataclass field |
input_mask_dtype
|
Dataclass field |
Methods
as_dict
as_dict()
Returns a dict representation of params_dict.ParamsDict.
For the nested params_dict.ParamsDict, a nested dict will be returned.
from_args
@classmethod
from_args( *args, **kwargs )
Builds a config from the given list of arguments.
from_json
@classmethod
from_json( file_path: str )
Wrapper for from_yaml
.
from_yaml
@classmethod
from_yaml( file_path: str )
get
get(
key, value=None
)
Accesses through built-in dictionary get method.
lock
lock()
Makes the ParamsDict immutable.
override
override(
override_params, is_strict=True
)
Override the ParamsDict with a set of given params.
Args | |
---|---|
override_params
|
a dict or a ParamsDict specifying the parameters to be overridden. |
is_strict
|
a boolean specifying whether override is strict or not. If
True, keys in override_params must be present in the ParamsDict. If
False, keys in override_params can be different from what is currently
defined in the ParamsDict. In this case, the ParamsDict will be extended
to include the new keys.
|
replace
replace(
**kwargs
)
Overrides/returns a unlocked copy with the current config unchanged.
validate
validate()
Validate the parameters consistency based on the restrictions.
This method validates the internal consistency using the pre-defined list of restrictions. A restriction is defined as a string which specfiies a binary operation. The supported binary operations are {'==', '!=', '<', '<=', '>', '>='}. Note that the meaning of these operators are consistent with the underlying Python immplementation. Users should make sure the define restrictions on their type make sense.
For example, for a ParamsDict like the following
a:
a1: 1
a2: 2
b:
bb:
bb1: 10
bb2: 20
ccc:
a1: 1
a3: 3
one can define two restrictions like this ['a.a1 == b.ccc.a1', 'a.a2 <= b.bb.bb2']
What it enforces are:
- a.a1 = 1 == b.ccc.a1 = 1
- a.a2 = 2 <= b.bb.bb2 = 20
Raises | |
---|---|
KeyError
|
if any of the following happens (1) any of parameters in any of restrictions is not defined in ParamsDict, (2) any inconsistency violating the restriction is found. |
ValueError
|
if the restriction defined in the string is not supported. |
__contains__
__contains__(
key
)
Implements the membership test operator.
__eq__
__eq__(
other
)