View source on GitHub |
One step of (the outer loop of) the GLM fitting algorithm.
tfp.glm.fit_sparse_one_step(
model_matrix,
response,
model,
model_coefficients_start,
tolerance,
l1_regularizer,
l2_regularizer=None,
maximum_full_sweeps=None,
learning_rate=None,
name=None
)
This function returns a new value of model_coefficients
, equal to
model_coefficients_start + model_coefficients_update
. The increment
model_coefficients_update in R^n
is computed by a coordinate descent method,
that is, by a loop in which each iteration updates exactly one coordinate of
model_coefficients_update
. (Some updates may leave the value of the
coordinate unchanged.)
The particular update method used is to apply an L1-based proximity operator,
"soft threshold", whose fixed point model_coefficients_update^*
is the
desired minimum
model_coefficients_update^* = argmin{
-LogLikelihood(model_coefficients_start + model_coefficients_update')
+ l1_regularizer *
||model_coefficients_start + model_coefficients_update'||_1
+ l2_regularizer *
||model_coefficients_start + model_coefficients_update'||_2**2
: model_coefficients_update' }
where in each iteration model_coefficients_update'
has at most one nonzero
coordinate.
This update method preserves sparsity, i.e., tends to find sparse solutions if
model_coefficients_start
is sparse. Additionally, the choice of step size
is based on curvature (Fisher information matrix), which significantly speeds
up convergence.
Args | |
---|---|
model_matrix
|
(Batch of) matrix-shaped, float Tensor or SparseTensor
where each row represents a sample's features. Has shape [N, n] where
N is the number of data samples and n is the number of features per
sample.
|
response
|
(Batch of) vector-shaped Tensor with the same dtype as
model_matrix where each element represents a sample's observed response
(to the corresponding row of features).
|
model
|
tfp.glm.ExponentialFamily -like instance, which specifies the link
function and distribution of the GLM, and thus characterizes the negative
log-likelihood which will be minimized. Must have sufficient statistic
equal to the response, that is, T(y) = y .
|
model_coefficients_start
|
(Batch of) vector-shaped, float Tensor with
the same dtype as model_matrix , representing the initial values of the
coefficients for the GLM regression. Has shape [n] where model_matrix
has shape [N, n] .
|
tolerance
|
scalar, float Tensor representing the convergence threshold.
The optimization step will terminate early, returning its current value of
model_coefficients_start + model_coefficients_update , once the following
condition is met:
||model_coefficients_update_end - model_coefficients_update_start||_2
/ (1 + ||model_coefficients_start||_2)
< sqrt(tolerance) ,
where model_coefficients_update_end is the value of
model_coefficients_update at the end of a sweep and
model_coefficients_update_start is the value of
model_coefficients_update at the beginning of that sweep.
|
l1_regularizer
|
scalar, float Tensor representing the weight of the L1
regularization term (see equation above).
|
l2_regularizer
|
scalar, float Tensor representing the weight of the L2
regularization term (see equation above).
Default value: None (i.e., no L2 regularization).
|
maximum_full_sweeps
|
Python integer specifying maximum number of sweeps to
run. A "sweep" consists of an iteration of coordinate descent on each
coordinate. After this many sweeps, the algorithm will terminate even if
convergence has not been reached.
Default value: 1 .
|
learning_rate
|
scalar, float Tensor representing a multiplicative factor
used to dampen the proximal gradient descent steps.
Default value: None (i.e., factor is conceptually 1 ).
|
name
|
Python string representing the name of the TensorFlow operation. The
default name is "fit_sparse_one_step" .
|
Returns | |
---|---|
model_coefficients
|
(Batch of) Tensor having the same shape and dtype as
model_coefficients_start , representing the updated value of
model_coefficients , that is, model_coefficients_start +
model_coefficients_update .
|
is_converged
|
scalar, bool Tensor indicating whether convergence
occurred across all batches within the specified number of sweeps.
|
iter
|
scalar, int Tensor representing the actual number of coordinate
updates made (before achieving convergence). Since each sweep consists of
tf.size(model_coefficients_start) iterations, the maximum number of
updates is maximum_full_sweeps * tf.size(model_coefficients_start) .
|