Stay organized with collections Save and categorize content based on your preferences.

One step of (the outer loop of) the GLM fitting algorithm.

This function returns a new value of model_coefficients, equal to model_coefficients_start + model_coefficients_update. The increment model_coefficients_update in R^n is computed by a coordinate descent method, that is, by a loop in which each iteration updates exactly one coordinate of model_coefficients_update. (Some updates may leave the value of the coordinate unchanged.)

The particular update method used is to apply an L1-based proximity operator, "soft threshold", whose fixed point model_coefficients_update^* is the desired minimum

model_coefficients_update^* = argmin{
    -LogLikelihood(model_coefficients_start + model_coefficients_update')
      + l1_regularizer *
          ||model_coefficients_start + model_coefficients_update'||_1
      + l2_regularizer *
          ||model_coefficients_start + model_coefficients_update'||_2**2
    : model_coefficients_update' }

where in each iteration model_coefficients_update' has at most one nonzero coordinate.

This update method preserves sparsity, i.e., tends to find sparse solutions if model_coefficients_start is sparse. Additionally, the choice of step size is based on curvature (Fisher information matrix), which significantly speeds up convergence.

model_matrix (Batch of) matrix-shaped, float Tensor or SparseTensor where each row represents a sample's features. Has shape [N, n] where N is the number of data samples and n is the number of features per sample.
response (Batch of) vector-shaped Tensor with the same dtype as model_matrix where each element represents a sample's observed response (to the corresponding row of features).
model tfp.glm.ExponentialFamily-like instance, which specifies the link function and distribution of the GLM, and thus characterizes the negative log-likelihood which will be minimized. Must have sufficient statistic equal to the response, that is, T(y) = y.
model_coefficients_start (Batch of) vector-shaped, float Tensor with the same dtype as model_matrix, representing the initial values of the coefficients for the GLM regression. Has shape [n] where model_matrix has shape [N, n].
tolerance scalar, float Tensor representing the convergence threshold. The optimization step will terminate early, returning its current value of model_coefficients_start + model_coefficients_update, once the following condition is met: ||model_coefficients_update_end - model_coefficients_update_start||_2 / (1 + ||model_coefficients_start||_2) < sqrt(tolerance), where model_coefficients_update_end is the value of model_coefficients_update at the end of a sweep and model_coefficients_update_start is the value of model_coefficients_update at the beginning of that sweep.
l1_regularizer scalar, float Tensor representing the weight of the L1 regularization term (see equation above).
l2_regularizer scalar, float Tensor representing the weight of the L2 regularization term (see equation above). Default value: None (i.e., no L2 regularization).
maximum_full_sweeps Python integer specifying maximum number of sweeps to run. A "sweep" consists of an iteration of coordinate descent on each coordinate. After this many sweeps, the algorithm will terminate even if convergence has not been reached. Default value: 1.
learning_rate scalar, float Tensor representing a multiplicative factor used to dampen the proximal gradient descent steps. Default value: None (i.e., factor is conceptually 1).
name Python string representing the name of the TensorFlow operation. The default name is "fit_sparse_one_step".

model_coefficients (Batch of) Tensor having the same shape and dtype as model_coefficients_start, representing the updated value of model_coefficients, that is, model_coefficients_start + model_coefficients_update.
is_converged scalar, bool Tensor indicating whether convergence occurred across all batches within the specified number of sweeps.
iter scalar, int Tensor representing the actual number of coordinate updates made (before achieving convergence). Since each sweep consists of tf.size(model_coefficients_start) iterations, the maximum number of updates is maximum_full_sweeps * tf.size(model_coefficients_start).