Transformer model with Keras.

Implemented as described in:

The Transformer model consists of an encoder and decoder. The input is an int sequence (or a batch of sequences). The encoder produces a continuous representation, and the decoder uses the encoder output to generate probabilities for the output sequence.

vocab_size Size of vocabulary.
embedding_width Size of hidden layer for embedding.
dropout_rate Dropout probability.
padded_decode Whether to max_sequence_length padding is used. If set False, max_sequence_length padding is not used.
decode_max_length maximum number of steps to decode a sequence.
extra_decode_length Beam search will run extra steps to decode.
beam_size Number of beams for beam search
alpha The strength of length normalization for beam search.
encoder_layer An initialized encoder layer.
decoder_layer An initialized decoder layer.
eos_id Id of end of sentence token.
**kwargs other keyword arguments.



View source

Calculate target logits or inferred target sequences.

inputs a dictionary of tensors. Feature inputs (optional): int tensor with shape [batch_size, input_length]. Feature embedded_inputs (optional): float tensor with shape [batch_size, input_length, embedding_width]. Feature targets (optional): None or int tensor with shape [batch_size, target_length]. Feature input_masks (optional): When providing the embedded_inputs, the dictionary must provide a boolean mask marking the filled time steps. The shape of the tensor is [batch_size, input_length]. Either inputs or embedded_inputs and input_masks must be present in the input dictionary. In the second case the projection of the integer tokens to the transformer embedding space is skipped and input_masks is expected to be present.

If targets is defined, then return logits for each word in the target sequence, which is a float tensor with shape (batch_size, target_length, vocab_size). If target is None, then generate output sequence one token at a time and returns a dictionary { outputs: (batch_size, decoded_length) scores: (batch_size, 1)} Even when float16 is used, the output tensor(s) are always float32.

NotImplementedError If try to use padded decode method on CPU/GPUs.