tf.compat.v1.tpu.experimental.FtrlParameters

Optimization parameters for Ftrl with TPU embeddings.

Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec via the optimization_parameters argument to set the optimizer and its parameters. See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec for more details.

estimator = tf.estimator.tpu.TPUEstimator(
    ...
    embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec(
        ...
        optimization_parameters=tf.tpu.experimental.FtrlParameters(0.1),
        ...))

learning_rate a floating point value. The learning rate.
learning_rate_power A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. See section 3.1 in the paper.
initial_accumulator_value The starting value for accumulators. Only zero or positive values are allowed.
l1_regularization_strength A float value, must be greater than or equal to zero.
l2_regularization_strength A float value, must be greater than or equal to zero.
use_gradient_accumulation setting this to False makes embedding gradients calculation less accurate but faster. Please see optimization_parameters.proto for details. for details.
clip_weight_min the minimum value to clip by; None means -infinity.
clip_weight_max the maximum value to clip by; None means +infinity.
weight_decay_factor amount of weight decay to apply; None means that the weights are not decayed.
multiply_weight_decay_factor_by_learning_rate if true, weight_decay_factor is multiplied by the current learning rate.
multiply_linear_by_learning_rate When true, multiplies the usages of the linear slot in the weight update by the learning rate. This is useful when ramping up learning rate from 0 (which would normally produce NaNs).
beta The beta parameter for FTRL.
allow_zero_accumulator Changes the implementation of the square root to allow for the case of initial_accumulator_value being zero. This will cause a slight performance drop.
clip_gradient_min the minimum value to clip by; None means -infinity. Gradient accumulation must be set to true if this is set.
clip_gradient_max the maximum value to clip by; None means +infinity. Gradient accumulation must be set to true if this is set.