Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tf.compat.v1.tpu.experimental.FtrlParameters

View source on GitHub

Optimization parameters for Ftrl with TPU embeddings.

tf.compat.v1.tpu.experimental.FtrlParameters(
    learning_rate, learning_rate_power=-0.5, initial_accumulator_value=0.1,
    l1_regularization_strength=0.0, l2_regularization_strength=0.0,
    use_gradient_accumulation=True, clip_weight_min=None, clip_weight_max=None,
    weight_decay_factor=None, multiply_weight_decay_factor_by_learning_rate=None
)

Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec via the optimization_parameters argument to set the optimizer and its parameters. See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec for more details.

estimator = tf.estimator.tpu.TPUEstimator(
    ...
    embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec(
        ...
        optimization_parameters=tf.tpu.experimental.FtrlParameters(0.1),
        ...))

Args:

  • learning_rate: a floating point value. The learning rate.
  • learning_rate_power: A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. See section 3.1 in the paper.
  • initial_accumulator_value: The starting value for accumulators. Only zero or positive values are allowed.
  • l1_regularization_strength: A float value, must be greater than or equal to zero.
  • l2_regularization_strength: A float value, must be greater than or equal to zero.
  • use_gradient_accumulation: setting this to False makes embedding gradients calculation less accurate but faster. Please see optimization_parameters.proto for details. for details.
  • clip_weight_min: the minimum value to clip by; None means -infinity.
  • clip_weight_max: the maximum value to clip by; None means +infinity.
  • weight_decay_factor: amount of weight decay to apply; None means that the weights are not decayed.
  • multiply_weight_decay_factor_by_learning_rate: if true, weight_decay_factor is multiplied by the current learning rate.