Transformer parameters.

num_layers Dataclass field
d_model Dataclass field
d_kv Dataclass field
num_heads Dataclass field
d_ff Dataclass field
vocab_size Dataclass field
target_vocab_size Dataclass field
dropout_rate Dataclass field
layer_norm_epsilon Dataclass field
shared_embedding Dataclass field
vocab_embeddings_initializer Dataclass field
relative_attention_num_buckets Dataclass field
relative_attention_max_distance Dataclass field
relative_embeddings_initializer Dataclass field
weight_initializer Dataclass field
bias_initializer Dataclass field
rescale_query Dataclass field
bidirectional Dataclass field
ffn_activations Dataclass field
logits_via_embedding Dataclass field
num_decoder_layers Dataclass field
one_hot_embedding Dataclass field
layer_sharing Dataclass field
use_shared_relative_position_bias Dataclass field
return_attention_scores Dataclass field



He normal initializer.

Also available via the shortcut function tf.keras.initializers.he_normal.

It draws samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 / fan_in) where fan_in is the number of input units in the weight tensor.


# Standalone usage:
initializer = tf.keras.initializers.HeNormal()
values = initializer(shape=(2, 2))
# Usage in a Keras layer:
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)

seed A Python integer. Used to make the behavior of the initializer deterministic. Note that a seeded initializer will not produce the same random values across multiple calls, but multiple initializers will produce the same sequence when constructed with the same seed value.



bias_initializer None
bidirectional True
dropout_rate 0.0
ffn_activations ('relu',)
layer_norm_epsilon 1e-06
layer_sharing False
logits_via_embedding True
num_decoder_layers None
one_hot_embedding True
relative_attention_max_distance 128
relative_attention_num_buckets 32
relative_embeddings_initializer None
rescale_query False
return_attention_scores False
shared_embedding False
target_vocab_size None
use_shared_relative_position_bias True
vocab_embeddings_initializer None