TensorFlow is back at Google I/O on May 14! Register now

tf.keras.layers.Attention

{ }

Dot-product attention layer, a.k.a. Luong-style attention.

Inherits From: Layer, Operation

tf.keras.layers.Attention(
    use_scale=False,
    score_mode='dot',
    dropout=0.0,
    seed=None,
    **kwargs
)

Inputs are a list with 2 or 3 elements:

A query tensor of shape (batch_size, Tq, dim).
A value tensor of shape (batch_size, Tv, dim).
A optional key tensor of shape (batch_size, Tv, dim). If none supplied, value will be used as a key.

The calculation follows the steps:

Calculate attention scores using query and key with shape (batch_size, Tq, Tv).
Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).

Args
`use_scale`	If `True`, will create a scalar variable to scale the attention scores.
`dropout`	Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to `0.0`.
`seed`	A Python integer to use as random seed incase of `dropout`.
`score_mode`	Function to use to compute attention scores, one of `{"dot", "concat"}`. `"dot"` refers to the dot product between the query and key vectors. `"concat"` refers to the hyperbolic tangent of the concatenation of the `query` and `key` vectors.

Call Args
`inputs`	List of the following tensors: `query`: Query tensor of shape `(batch_size, Tq, dim)`. `value`: Value tensor of shape `(batch_size, Tv, dim)`. `key`: Optional key tensor of shape `(batch_size, Tv, dim)`. If not given, will use `value` for both `key` and `value`, which is the most common case.
`mask`	List of the following tensors: `query_mask`: A boolean mask tensor of shape `(batch_size, Tq)`. If given, the output will be zero at the positions where `mask==False`. `value_mask`: A boolean mask tensor of shape `(batch_size, Tv)`. If given, will apply the mask such that values at positions where `mask==False` do not contribute to the result.
`return_attention_scores`	bool, it `True`, returns the attention scores (after masking and softmax) as an additional output argument.
`training`	Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout).
`use_causal_mask`	Boolean. Set to `True` for decoder self-attention. Adds a mask such that position `i` cannot attend to positions `j > i`. This prevents the flow of information from the future towards the past. Defaults to `False`.

Output
Attention outputs of shape `(batch_size, Tq, dim)`. (Optional) Attention scores after masking and softmax with shape `(batch_size, Tq, Tv)`.

Attributes
`input`	Retrieves the input tensor(s) of a symbolic operation. Only returns the tensor(s) corresponding to the first time the operation was called.
`output`	Retrieves the output tensor(s) of a layer. Only returns the tensor(s) corresponding to the first time the operation was called.

Attributes

input

Retrieves the input tensor(s) of a symbolic operation.

Only returns the tensor(s) corresponding to the first time the operation was called.

output

Retrieves the output tensor(s) of a layer.

Only returns the tensor(s) corresponding to the first time the operation was called.

Methods

`from_config`

View source

@classmethod
from_config(
    config
)

Creates a layer from its config.

This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).

Args
`config`	A Python dictionary, typically the output of get_config.

Returns
A layer instance.

`symbolic_call`

View source

symbolic_call(
    *args, **kwargs
)

tf.keras.layers.Attention

Args

Call Args

Output

Attributes

Methods

from_config

symbolic_call

`from_config`

`symbolic_call`