|View source on GitHub|
Feed-forward layer with multiple experts.
tfm.nlp.layers.FeedForwardExperts( num_experts: int, d_ff: int, *, inner_dropout: float = 0.0, output_dropout: float = 0.0, activation: Callable[[tf.Tensor], tf.Tensor] = tf.keras.activations.gelu, kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER, bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER, name: str = 'experts', **kwargs )
Note that call() takes inputs with shape [num_groups, num_experts, expert_capacity, hidden_dim] which is different from the usual [batch_size, seq_len, hidden_dim] used by the FeedForward layer.
The experts are independent FeedForward layers of the same shape, i.e. the kernel doesn't have shape [hidden_dim, out_dim], but [num_experts, hidden_dim, out_dim].
call( inputs: tf.Tensor, *, training: Optional[bool] = None ) -> tf.Tensor
Applies layer to inputs.
Inputs of shape
||Only apply dropout during training.|
Transformed inputs with the same shape as inputs