Optimizer that implements the FTRL algorithm.

Inherits From: Optimizer

This version has support for both online L2 (McMahan et al., 2013) and shrinkage-type L2, which is the addition of an L2 penalty to the loss function.


Ad-click prediction: McMahan et al., 2013 (pdf)

learning_rate A float value or a constant float Tensor.
learning_rate_power A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. See section 3.1 in (McMahan et al., 2013).
initial_accumulator_value The starting value for accumulators. Only zero or positive values are allowed.
l1_regularization_strength A float value, must be greater than or equal to zero.
l2_regularization_strength A float value, must be greater than or equal to zero.
use_locking If True use locks for update operations.
name Optional name prefix for the operations created when applying gradients. Defaults to "Ftrl".
accum_name The suffix for the variable that keeps the gradient squared accumulator. If not present, defaults to name.
linear_name The suffix for the variable that keeps the linear gradient accumulator. If not present, defaults to name + "1".
l2_shrinkage_regularization_strength A float value, must be greater than or equal to zero. This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. The FTRL formulation can be written as: w{t+1} = argminw(\hat{g}{1:t}w + L1||w||_1 + L2||w||_2^2), where \hat{g} = g + (2L2_shrinkagew), and g is the gradient of the loss function w.r.t. the weights w. Specifically, in the absence of L1 regularization, it is equivalent to the following update rule: w_{t+1} = w_t - lr_t / (beta + 2L2lr_t) * g_t - 2L2_shrinkagelr_t / (beta + 2L2lr_t) * w_t where lr_t is the learning rate at t. When input is sparse shrinkage will only happen on the active weights.
beta A float value; corresponds to the beta parameter in the paper.

ValueError If one of the arguments is invalid.



View source

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

grads_and_vars List of (gradient, variable) pairs as returned by compute_gradients().
global_step Optional Variable to increment by one after the variables have been updated.
name Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

TypeError If grads_and_vars is malformed.
ValueError If none of the variables have gradients.
RuntimeError If you should use _distributed_apply() instead.


View source

Compute gradients of loss for the variables in var_list.

This is the first part of minimize(). It returns a list of (gradient, variable) pairs where "gradient" is the gradient for "variable". Note that "gradient" can be a