tfa.seq2seq.LuongMonotonicAttention

View source on GitHub

Monotonic attention mechanism with Luong-style energy function.

This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can't attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Otherwise, it is equivalent to LuongAttention. This approach is proposed in

Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, "Online and Linear-Time Attention by Enforcing Monotonic Alignments." ICML 2017.

Args:

  • units: The depth of the query mechanism.
  • memory: The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
  • memory_sequence_length: (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • scale: Python boolean. Whether to scale the energy term.
  • sigmoid_noise: Standard deviation of pre-sigmoid noise. See the docstring for _monotonic_probability_fn for more information.
  • sigmoid_noise_seed: (optional) Random seed for pre-sigmoid noise.
  • score_bias_init: Initial value for score bias scalar. It's recommended to initialize this to a negative value when the length of the memory is large.
  • mode: How to compute the attention distribution. Must be one of 'recursive', 'parallel', or 'hard'. See the docstring for tfa.seq2seq.monotonic_attention for more information.
  • dtype: The data type for the query and memory layers of the attention mechanism.
  • name: Name to use when creating ops.
  • **kwargs: Dictionary that contains other common arguments for layer creation.

Attributes:

  • activity_regularizer: Optional regularizer function for the output of this layer.
  • alignments_size
  • dtype
  • dynamic
  • input: Retrieves the input tensor(s) of a layer.

    Only applicable if the layer has exactly one input, i.e. if it is connected to one incoming layer.

  • input_mask: Retrieves the input mask tensor(s) of a layer.

    Only applicable if the layer has exactly one inbound node, i.e. if it is connected to one incoming layer.

  • input_shape: Retrieves the input shape(s) of a layer.

    Only applicable if the layer has exactly one input, i.e. if it is connected to one incoming layer, or if all inputs have the same shape.

  • input_spec

  • losses: Losses which are associated with this Layer.

    Variable regularization tensors are created when this property is accessed, so it is eager safe: accessing losses under a tf.GradientTape will propagate gradients back to the corresponding variables.

  • memory_initialized: Returns True if this attention mechanism has been initialized with a memory.

  • metrics

  • name: Returns the name of this module as passed or determined in the ctor.

    NOTE: This is not the same as the self.name_scope.name which includes parent module names.

  • name_scope: Returns a tf.name_scope instance for this class.

  • non_trainable_variables

  • non_trainable_weights

  • output: Retrieves the output tensor(s) of a layer.

    Only applicable if the layer has exactly one output, i.e. if it is connected to one incoming layer.

  • output_mask: Retrieves the output mask tensor(s) of a layer.

    Only applicable if the layer has exactly one inbound node, i.e. if it is connected to one incoming layer.

  • output_shape: Retrieves the output shape(s) of a layer.

    Only applicable if the layer has one output, or if all outputs have the same shape.

  • state_size

  • submodules: Sequence of all sub-modules.

    Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).

a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []
  • trainable
  • trainable_variables: Sequence of trainable variables owned by this module and its submodules.

  • trainable_weights

  • updates

  • variables: Returns the list of all layer variables/weights.

    Alias of self.weights.

  • weights: Returns the list of all layer variables/weights.

Methods

__call__

View source

Preprocess the inputs before calling base_layer.__call__().

Note that there are situation here, one for setup memory, and one with actual query and state.

  1. When the memory has not been configured, we just pass all the param to baselayer.call_(), which will then invoke self.call() with proper inputs, which allows this class to setup memory.
  2. When the memory has already been setup, the input should contain query and state, and optionally processed memory. If the processed memory is not included in the input, we will have to append it to the inputs and give it to the baselayer.call(). The processed memory is the output of first invocation of self.call_(). If we don't add it here, then from keras perspective, the graph is disconnected since the output from previous call is never used.

Args:

  • inputs: the inputs tensors.
  • **kwargs: dict, other keyeword arguments for the __call__()

build

View source

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call.

This is typically used to create the weights of Layer subclasses.

Arguments:

  • input_shape: Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

compute_mask

View source

Computes an output mask tensor.

Arguments:

  • inputs: Tensor or list of tensors.
  • mask: Tensor or list of tensors.

Returns:

None or a tensor (or list of tensors, one per output tensor of the layer).

compute_output_shape

Computes the output shape of the layer.

If the layer has not been built, this method will call build on the layer. This assumes that the layer will later be used with inputs that match the input shape provided here.

Arguments:

  • input_shape: Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

Returns:

An input shape tuple.

count_params

Count the total number of scalars composing the weights.

Returns:

An integer count.

Raises:

  • ValueError: if the layer isn't yet built (in which case its weights aren't yet defined).

deserialize_inner_layer_from_config

View source

Helper method that reconstruct the query and memory from the config.

In the get_config() method, the query and memory layer configs are serialized into dict for persistence, this method perform the reverse action to reconstruct the layer from the config.

Args:

  • config: dict, the configs that will be used to reconstruct the object.
  • custom_objects: dict mapping class names (or function names) of custom (non-Keras) objects to class/functions.

Returns:

  • config: dict, the config with layer instance created, which is ready to be used as init parameters.

from_config

View source

Creates a layer from its config.

This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).

Arguments:

  • config: A Python dictionary, typically the output of get_config.

Returns:

A layer instance.

get_config

View source

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Returns:

Python dictionary.

get_input_at

Retrieves the input tensor(s) of a layer at a given node.

Arguments:

  • node_index: Integer, index of the node from which to retrieve the attribute. E.g. node_index=0 will correspond to the first time the layer was called.

Returns:

A tensor (or list of tensors if the layer has multiple inputs).

Raises:

  • RuntimeError: If called in Eager mode.

get_input_mask_at

Retrieves the input mask tensor(s) of a layer at a given node.

Arguments:

  • node_index: Integer, index of the node from which to retrieve the attribute. E.g. node_index=0 will correspond to the first time the layer was called.

Returns:

A mask tensor (or list of tensors if the layer has multiple inputs).

get_input_shape_at

Retrieves the input shape(s) of a layer at a given node.

Arguments:

  • node_index: Integer, index of the node from which to retrieve the attribute. E.g. node_index=0 will correspond to the first time the layer was called.

Returns:

A shape tuple (or list of shape tuples if the layer has multiple inputs).

Raises:

  • RuntimeError: If called in Eager mode.

get_losses_for

Retrieves losses relevant to a specific set of inputs.

Arguments:

  • inputs: Input tensor or list/tuple of input tensors.

Returns:

List of loss tensors of the layer that depend on inputs.

get_output_at

Retrieves the output tensor(s) of a layer at a given node.

Arguments:

  • node_index: Integer, index of the node from which to retrieve the attribute. E.g. node_index=0 will correspond to the first time the layer was called.

Returns:

A tensor (or list of tensors if the layer has multiple outputs).

Raises:

  • RuntimeError: If called in Eager mode.

get_output_mask_at

Retrieves the output mask tensor(s) of a layer at a given node.

Arguments:

  • node_index: Integer, index of the node from which to retrieve the attribute. E.g. node_index=0 will correspond to the first time the layer was called.

Returns:

A mask tensor (or list of tensors if the layer has multiple outputs).

get_output_shape_at

Retrieves the output shape(s) of a layer at a given node.

Arguments:

  • node_index: Integer, index of the node from which to retrieve the attribute. E.g. node_index=0 will correspond to the first time the layer was called.

Returns:

A shape tuple (or list of shape tuples if the layer has multiple outputs).

Raises:

  • RuntimeError: If called in Eager mode.

get_updates_for

Retrieves updates relevant to a specific set of inputs.

Arguments:

  • inputs: Input tensor or list/tuple of input tensors.

Returns:

List of update ops of the layer that depend on inputs.

get_weights

Returns the current weights of the layer.

Returns:

Weights values as a list of numpy arrays.

initial_alignments

View source

Creates the initial alignment values for the monotonic attentions.

Initializes to dirac distributions, i.e. [1, 0, 0, ...memory length..., 0] for all entries in the batch.

Args:

  • batch_size: int32 scalar, the batch_size.
  • dtype: The dtype.

Returns:

A dtype tensor shaped [batch_size, alignments_size] (alignments_size is the values' max_time).

initial_state

View source

Creates the initial state values for the AttentionWrapper class.

This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention).

The default behavior is to return the same output as initial_alignments.

Args:

  • batch_size: int32 scalar, the batch_size.
  • dtype: The dtype.

Returns:

A structure of all-zero tensors with shapes as described by state_size.

set_weights

Sets the weights of the layer, from Numpy arrays.

Arguments:

  • weights: a list of Numpy arrays. The number of arrays and their shape must match number of the dimensions of the weights of the layer (i.e. it should match the output of get_weights).

Raises:

  • ValueError: If the provided weights list does not match the layer's specifications.

setup_memory

View source

Pre-process the memory before actually query the memory.

This should only be called once at the first invocation of call().

Args:

  • memory: The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...]. memory_sequence_length (optional): Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • memory_mask: (Optional) The boolean tensor with shape [batch_size, max_time]. For any value equal to False, the corresponding value in memory should be ignored.

with_name_scope

Decorator to automatically enter the module name scope.

class MyModule(tf.Module):
  @tf.Module.with_name_scope
  def __call__(self, x):
    if not hasattr(self, 'w'):
      self.w = tf.Variable(tf.random.normal([x.shape[1], 64]))
    return tf.matmul(x, self.w)

Using the above module would produce tf.Variables and tf.Tensors whose names included the module name:

mod = MyModule()
mod(tf.ones([8, 32]))
# ==> <tf.Tensor: ...>
mod.w
# ==> <tf.Variable ...'my_module/w:0'>

Args:

  • method: The method to wrap.

Returns:

The original method wrapped such that it enters the module's name scope.