View source on GitHub 
Optimization parameters for Adam with TPU embeddings.
tf.tpu.experimental.embedding.Adam(
learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e07, lazy_adam=True,
sum_inside_sqrt=True, use_gradient_accumulation=True, clip_weight_min=None,
clip_weight_max=None, weight_decay_factor=None,
multiply_weight_decay_factor_by_learning_rate=None,
slot_variable_creation_fn=None
)
Pass this to tf.tpu.experimental.embedding.TPUEmbedding
via the optimizer
argument to set the global optimizer and its parameters:
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
...
optimizer=tf.tpu.experimental.embedding.Adam(0.1))
This can also be used in a tf.tpu.experimental.embedding.TableConfig
as the
optimizer parameter to set a table specific optimizer. This will override the
optimizer and parameters for global embedding optimizer defined above:
table_one = tf.tpu.experimental.embedding.TableConfig(
vocabulary_size=...,
dim=...,
optimizer=tf.tpu.experimental.embedding.Adam(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
vocabulary_size=...,
dim=...)
feature_config = (
tf.tpu.experimental.embedding.FeatureConfig(
table=table_one),
tf.tpu.experimental.embedding.FeatureConfig(
table=table_two))
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
feature_config=feature_config,
batch_size=...
optimizer=tf.tpu.experimental.embedding.Adam(0.1))
In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.
Args  

learning_rate

The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate. 
beta_1

A float value. The exponential decay rate for the 1st moment estimates. 
beta_2

A float value. The exponential decay rate for the 2nd moment estimates. 
epsilon

A small constant for numerical stability. 
lazy_adam

Use lazy Adam instead of Adam. Lazy Adam trains faster. 
sum_inside_sqrt

When this is true, the Adam update formula is changed
from m / (sqrt(v) + epsilon) to m / sqrt(v + epsilon**2) . This
option improves the performance of TPU training and is not expected to
harm model quality.

use_gradient_accumulation

Setting this to False makes embedding
gradients calculation less accurate but faster.

clip_weight_min

the minimum value to clip by; None means infinity. 
clip_weight_max

the maximum value to clip by; None means +infinity. 
weight_decay_factor

amount of weight decay to apply; None means that the weights are not decayed. 
multiply_weight_decay_factor_by_learning_rate

if true,
weight_decay_factor is multiplied by the current learning rate.

slot_variable_creation_fn

a callable taking two parameters, a variable and a list of slot names to create for it. This function should return a dict with the slot names as keys and the created variables as values. When set to None (the default), uses the builtin variable creation. 