inputs: Any, output_range: Optional[tf.Tensor] = None
) -> Any
Transformer self-attention encoder block call.
a single tensor or a list of tensors. input tensor as the single
sequence of embeddings. [input tensor, attention mask] to have the
additional attention mask. [query tensor, key value tensor,
attention mask] to have separate input streams for the query, and
key/value to the multi-head attention.
the sequence output range, [0, output_range) for slicing the
target sequence. None means the target sequence is not sliced. If you
would like to have no change to the model training, it is better to only
set the output_range for serving.
An output tensor with the same dimensions as input/query tensor.