Have a question? Connect with the community at the TensorFlow Forum Visit Forum


Transformer model with Keras.

Implemented as described in: https://arxiv.org/pdf/1706.03762.pdf

The Transformer model consists of an encoder and decoder. The input is an int sequence (or a batch of sequences). The encoder produces a continuous representation, and the decoder uses the encoder output to generate probabilities for the output sequence.

vocab_size Size of vocabulary.
embedding_width Size of hidden layer for embedding.
dropout_rate Dropout probability.
padded_decode Whether to max_sequence_length padding is used. If set False, max_sequence_length padding is not used.
decode_max_length maximum number of steps to decode a sequence.
extra_decode_length Beam search will run extra steps to decode.
beam_size Number of beams for beam search
alpha The strength of length normalization for beam search.
encoder_layer An initialized encoder layer.
decoder_layer An initialized decoder layer.
dtype float dtype.
eos_id Id of end of sentence token.
**kwargs other keyword arguments.



View source

Calculate target logits or inferred target sequences.

inputs a dictionary of tensors. Feature inputs: int tensor with shape [batch_size, input_length]. Feature targets (optional): None or int tensor with shape [batch_size, target_length].

If targets is defined, then return logits for each word in the target sequence. float tensor with shape [batch_size, target_length, vocab_size] If target is none, then generate output sequence one token at a time. returns a dictionary { outputs: [batch_size, decoded length] scores: [batch_size, float]} Even when float16 is used, the output tensor(s) are always float32.

NotImplementedError If try to use padded decode method on CPU/GPUs.