Minimize using Hessian-informed proximal gradient descent.
tfp.optimizer.proximal_hessian_sparse_minimize(
grad_and_hessian_loss_fn,
x_start,
tolerance,
l1_regularizer,
l2_regularizer=None,
maximum_iterations=1,
maximum_full_sweeps_per_iteration=1,
learning_rate=None,
name=None
)
This function solves the regularized minimization problem
argmin{ Loss(x)
+ l1_regularizer * ||x||_1
+ l2_regularizer * ||x||_2**2
: x in R^n }
where Loss
is a convex C^2 function (typically, Loss
is the negative log
likelihood of a model and x
is a vector of model coefficients). The Loss
function does not need to be supplied directly, but this optimizer does need a
way to compute the gradient and Hessian of the Loss function at a given value
of x
. The gradient and Hessian are often computationally expensive, and
this optimizer calls them relatively few times compared with other algorithms.
Args |
grad_and_hessian_loss_fn
|
callable that takes as input a (batch of) Tensor
of the same shape and dtype as x_start and returns the triple
(gradient_unregularized_loss, hessian_unregularized_loss_outer,
hessian_unregularized_loss_middle) as defined in the argument spec of
minimize_one_step .
|
x_start
|
(Batch of) vector-shaped, float Tensor representing the initial
value of the argument to the Loss function.
|
tolerance
|
scalar, float Tensor representing the tolerance for each
optimization step; see the tolerance argument of
minimize_one_step .
|
l1_regularizer
|
scalar, float Tensor representing the weight of the L1
regularization term (see equation above).
|
l2_regularizer
|
scalar, float Tensor representing the weight of the L2
regularization term (see equation above).
Default value: None (i.e., no L2 regularization).
|
maximum_iterations
|
Python integer specifying the maximum number of
iterations of the outer loop of the optimizer. After this many iterations
of the outer loop, the algorithm will terminate even if the return value
optimal_x has not converged.
Default value: 1 .
|
maximum_full_sweeps_per_iteration
|
Python integer specifying the maximum
number of sweeps allowed in each iteration of the outer loop of the
optimizer. Passed as the maximum_full_sweeps argument to
minimize_one_step .
Default value: 1 .
|
learning_rate
|
scalar, float Tensor representing a multiplicative factor
used to dampen the proximal gradient descent steps.
Default value: None (i.e., factor is conceptually 1 ).
|
name
|
Python string representing the name of the TensorFlow operation.
The default name is "minimize" .
|
Returns |
x
|
Tensor of the same shape and dtype as x_start , representing the
(batches of) computed values of x which minimizes Loss(x) .
|
is_converged
|
scalar, bool Tensor indicating whether the minimization
procedure converged within the specified number of iterations across all
batches. Here convergence means that an iteration of the inner loop
(minimize_one_step ) returns True for its is_converged output value.
|
iter
|
scalar, int Tensor indicating the actual number of iterations of
the outer loop of the optimizer completed (i.e., number of calls to
minimize_one_step before achieving convergence).
|
References
[1]: Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths
for Generalized Linear Models via Coordinate Descent. Journal of
Statistical Software, 33(1), 2010.
https://www.jstatsoft.org/article/view/v033i01/v33i01.pdf
[2]: Guo-Xun Yuan, Chia-Hua Ho and Chih-Jen Lin. An Improved GLMNET for
L1-regularized Logistic Regression. Journal of Machine Learning
Research, 13, 2012.
http://www.jmlr.org/papers/volume13/yuan12a/yuan12a.pdf