![]() |
Masked matmul router using experts choose tokens assignment.
tfm.nlp.layers.ExpertsChooseMaskedRouter(
num_experts: int,
*,
jitter_noise: float = 0.0,
use_bias: bool = True,
kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER,
bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER,
router_z_loss_weight: float = 0.0,
export_metrics: bool = True,
name: str = 'router',
**kwargs
)
This router uses the same mechanism as in Mixture-of-Experts with Expert Choice (https://arxiv.org/abs/2202.09368): each expert selects its top expert_capacity tokens. An individual token may be processed by multiple experts or none at all.
Uses Keras add_loss() and add_metric() APIs.
Methods
call
call(
inputs: tf.Tensor, *, expert_capacity: int, training: Optional[bool] = None
) -> RouterOutput
Computes dispatch and combine arrays for routing to experts.
Args | |
---|---|
inputs
|
Inputs to send to experts of shape
|
expert_capacity
|
Each group will send this many tokens to each expert. |
training
|
If true, apply jitter noise during routing. If not provided taken from tf.keras.backend. |
Returns | |
---|---|
Router indices or mask arrays (depending on router type). |