Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfp.optimizer.proximal_hessian_sparse_one_step

One step of (the outer loop of) the minimization algorithm.

tfp.optimizer.proximal_hessian_sparse_one_step(
    gradient_unregularized_loss,
    hessian_unregularized_loss_outer,
    hessian_unregularized_loss_middle,
    x_start,
    tolerance,
    l1_regularizer,
    l2_regularizer=None,
    maximum_full_sweeps=1,
    learning_rate=None,
    name=None
)

Defined in python/optimizer/proximal_hessian_sparse.py.

This function returns a new value of x, equal to x_start + x_update. The increment x_update in R^n is computed by a coordinate descent method, that is, by a loop in which each iteration updates exactly one coordinate of x_update. (Some updates may leave the value of the coordinate unchanged.)

The particular update method used is to apply an L1-based proximity operator, "soft threshold", whose fixed point x_update_fix is the desired minimum

x_update_fix = argmin{
    Loss(x_start + x_update')
      + l1_regularizer * ||x_start + x_update'||_1
      + l2_regularizer * ||x_start + x_update'||_2**2
    : x_update' }

where in each iteration x_update' is constrained to have at most one nonzero coordinate.

This update method preserves sparsity, i.e., tends to find sparse solutions if x_start is sparse. Additionally, the choice of step size is based on curvature (Hessian), which significantly speeds up convergence.

This algorithm assumes that Loss is convex, at least in a region surrounding the optimum. (If l2_regularizer > 0, then only weak convexity is needed.)

Args:

  • gradient_unregularized_loss: (Batch of) Tensor with the same shape and dtype as x_start representing the gradient, evaluated at x_start, of the unregularized loss function (denoted Loss above). (In all current use cases, Loss is the negative log likelihood.)
  • hessian_unregularized_loss_outer: (Batch of) Tensor or SparseTensor having the same dtype as x_start, and shape [N, n] where x_start has shape [n], satisfying the property Transpose(hessian_unregularized_loss_outer) @ diag(hessian_unregularized_loss_middle) @ hessian_unregularized_loss_inner = (approximation of) Hessian matrix of Loss, evaluated at x_start.
  • hessian_unregularized_loss_middle: (Batch of) vector-shaped Tensor having the same dtype as x_start, and shape [N] where hessian_unregularized_loss_outer has shape [N, n], satisfying the property Transpose(hessian_unregularized_loss_outer) @ diag(hessian_unregularized_loss_middle) @ hessian_unregularized_loss_inner = (approximation of) Hessian matrix of Loss, evaluated at x_start.
  • x_start: (Batch of) vector-shaped, float Tensor representing the current value of the argument to the Loss function.
  • tolerance: scalar, float Tensor representing the convergence threshold. The optimization step will terminate early, returning its current value of x_start + x_update, once the following condition is met: ||x_update_end - x_update_start||_2 / (1 + ||x_start||_2) < sqrt(tolerance), where x_update_end is the value of x_update at the end of a sweep and x_update_start is the value of x_update at the beginning of that sweep.
  • l1_regularizer: scalar, float Tensor representing the weight of the L1 regularization term (see equation above). If L1 regularization is not required, then tfp.glm.fit_one_step is preferable.
  • l2_regularizer: scalar, float Tensor representing the weight of the L2 regularization term (see equation above). Default value: None (i.e., no L2 regularization).
  • maximum_full_sweeps: Python integer specifying maximum number of sweeps to run. A "sweep" consists of an iteration of coordinate descent on each coordinate. After this many sweeps, the algorithm will terminate even if convergence has not been reached. Default value: 1.
  • learning_rate: scalar, float Tensor representing a multiplicative factor used to dampen the proximal gradient descent steps. Default value: None (i.e., factor is conceptually 1).
  • name: Python string representing the name of the TensorFlow operation. The default name is "minimize_one_step".

Returns:

  • x: (Batch of) Tensor having the same shape and dtype as x_start, representing the updated value of x, that is, x_start + x_update.
  • is_converged: scalar, bool Tensor indicating whether convergence occurred across all batches within the specified number of sweeps.
  • iter: scalar, int Tensor representing the actual number of coordinate updates made (before achieving convergence). Since each sweep consists of tf.size(x_start) iterations, the maximum number of updates is maximum_full_sweeps * tf.size(x_start).

References

[1]: Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 2010. https://www.jstatsoft.org/article/view/v033i01/v33i01.pdf

[2]: Guo-Xun Yuan, Chia-Hua Ho and Chih-Jen Lin. An Improved GLMNET for L1-regularized Logistic Regression. Journal of Machine Learning Research, 13, 2012. http://www.jmlr.org/papers/volume13/yuan12a/yuan12a.pdf