TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfp.glm.fit_sparse

View source on GitHub

Fits a GLM using coordinate-wise FIM-informed proximal gradient descent.

tfp.glm.fit_sparse(
    model_matrix,
    response,
    model,
    model_coefficients_start,
    tolerance,
    l1_regularizer,
    l2_regularizer=None,
    maximum_iterations=None,
    maximum_full_sweeps_per_iteration=1,
    learning_rate=None,
    name=None
)

This function uses a L1- and L2-regularized, second-order quasi-Newton method to find maximum-likelihood parameters for the given model and observed data. The second-order approximations use negative Fisher information in place of the Hessian, that is,

FisherInfo = E_Y[Hessian with respect to model_coefficients of -LogLikelihood(
    Y | model_matrix, current value of model_coefficients)]

For large, sparse data sets, model_matrix should be supplied as a SparseTensor.

Args:

  • model_matrix: (Batch of) matrix-shaped, float Tensor or SparseTensor where each row represents a sample's features. Has shape [N, n] where N is the number of data samples and n is the number of features per sample.
  • response: (Batch of) vector-shaped Tensor with the same dtype as model_matrix where each element represents a sample's observed response (to the corresponding row of features).
  • model: tfp.glm.ExponentialFamily-like instance, which specifies the link function and distribution of the GLM, and thus characterizes the negative log-likelihood which will be minimized. Must have sufficient statistic equal to the response, that is, T(y) = y.
  • model_coefficients_start: (Batch of) vector-shaped, float Tensor with the same dtype as model_matrix, representing the initial values of the coefficients for the GLM regression. Has shape [n] where model_matrix has shape [N, n].
  • tolerance: scalar, float Tensor representing the tolerance for each optiization step; see the tolerance argument of fit_sparse_one_step.
  • l1_regularizer: scalar, float Tensor representing the weight of the L1 regularization term.
  • l2_regularizer: scalar, float Tensor representing the weight of the L2 regularization term. Default value: None (i.e., no L2 regularization).
  • maximum_iterations: Python integer specifying maximum number of iterations of the outer loop of the optimizer (i.e., maximum number of calls to fit_sparse_one_step). After this many iterations of the outer loop, the algorithm will terminate even if the return value model_coefficients has not converged. Default value: 1.
  • maximum_full_sweeps_per_iteration: Python integer specifying the maximum number of coordinate descent sweeps allowed in each iteration. Default value: 1.
  • learning_rate: scalar, float Tensor representing a multiplicative factor used to dampen the proximal gradient descent steps. Default value: None (i.e., factor is conceptually 1).
  • name: Python string representing the name of the TensorFlow operation. The default name is "fit_sparse".

Returns:

  • model_coefficients: (Batch of) Tensor of the same shape and dtype as model_coefficients_start, representing the computed model coefficients which minimize the regularized negative log-likelihood.
  • is_converged: scalar, bool Tensor indicating whether the minimization procedure converged across all batches within the specified number of iterations. Here convergence means that an iteration of the inner loop (fit_sparse_one_step) returns True for its is_converged output value.
  • iter: scalar, int Tensor indicating the actual number of iterations of the outer loop of the optimizer completed (i.e., number of calls to fit_sparse_one_step before achieving convergence).

Example

from __future__ import print_function
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions

def make_dataset(n, d, link, scale=1., dtype=np.float32):
  model_coefficients = tfd.Uniform(
      low=np.array(-1, dtype), high=np.array(1, dtype)).sample(
          d, seed=42)
  radius = np.sqrt(2.)
  model_coefficients *= radius / tf.linalg.norm(model_coefficients)
  mask = tf.random_shuffle(tf.range(d)) < tf.to_int32(0.5 * tf.to_float(d))
  model_coefficients = tf.where(mask, model_coefficients,
                                tf.zeros_like(model_coefficients))
  model_matrix = tfd.Normal(
      loc=np.array(0, dtype), scale=np.array(1, dtype)).sample(
          [n, d], seed=43)
  scale = tf.convert_to_tensor(scale, dtype)
  linear_response = tf.matmul(model_matrix,
                              model_coefficients[..., tf.newaxis])[..., 0]
  if link == 'linear':
    response = tfd.Normal(loc=linear_response, scale=scale).sample(seed=44)
  elif link == 'probit':
    response = tf.cast(
        tfd.Normal(loc=linear_response, scale=scale).sample(seed=44) > 0,
                   dtype)
  elif link == 'logit':
    response = tfd.Bernoulli(logits=linear_response).sample(seed=44)
  else:
    raise ValueError('unrecognized true link: {}'.format(link))
  return model_matrix, response, model_coefficients, mask

with tf.Session() as sess:
  x_, y_, model_coefficients_true_, _ = sess.run(make_dataset(
      n=int(1e5), d=100, link='probit'))

  model = tfp.glm.Bernoulli()
  model_coefficients_start = tf.zeros(x_.shape[-1], np.float32)

  model_coefficients, is_converged, num_iter = tfp.glm.fit_sparse(
      model_matrix=tf.convert_to_tensor(x_),
      response=tf.convert_to_tensor(y_),
      model=model,
      model_coefficients_start=model_coefficients_start,
      l1_regularizer=800.,
      l2_regularizer=None,
      maximum_iterations=10,
      maximum_full_sweeps_per_iteration=10,
      tolerance=1e-6,
      learning_rate=None)

  model_coefficients_, is_converged_, num_iter_ = sess.run([
      model_coefficients, is_converged, num_iter])

  print("is_converged:", is_converged_)
  print("    num_iter:", num_iter_)
  print("\nLearned / True")
  print(np.concatenate(
      [[model_coefficients_], [model_coefficients_true_]], axis=0).T)

# ==>
# is_converged: True
#     num_iter: 1
#
# Learned / True
# [[ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.11195257  0.12484948]
#  [ 0.          0.        ]
#  [ 0.05191106  0.06394956]
#  [-0.15090358 -0.15325639]
#  [-0.18187316 -0.18825999]
#  [-0.06140942 -0.07994166]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.14474444  0.15810856]
#  [ 0.          0.        ]
#  [-0.25249591 -0.24260855]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.03888761 -0.06755984]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.0192222  -0.04169233]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.01434913  0.03568212]
#  [-0.11336883 -0.12873614]
#  [ 0.          0.        ]
#  [-0.24496339 -0.24048163]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.04088281  0.06565224]
#  [-0.12784363 -0.13359821]
#  [ 0.05618424  0.07396613]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [ 0.         -0.01719233]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.00076072 -0.03607186]
#  [ 0.21801499  0.21146794]
#  [-0.02161094 -0.04031265]
#  [ 0.0918689   0.10487888]
#  [ 0.0106154   0.03233612]
#  [-0.07817317 -0.09725142]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.23725343 -0.24194022]
#  [ 0.          0.        ]
#  [-0.08725718 -0.1048776 ]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.02114314 -0.04145789]
#  [ 0.          0.        ]
#  [ 0.          0.        ]
#  [-0.02710908 -0.04590397]
#  [ 0.15293184  0.15415154]
#  [ 0.2114463   0.2088728 ]
#  [-0.10969634 -0.12368613]
#  [ 0.         -0.01505797]
#  [-0.01140458 -0.03234904]
#  [ 0.16051085  0.1680062 ]
#  [ 0.09816848  0.11094204]

References

[1]: Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 2010. https://www.jstatsoft.org/article/view/v033i01/v33i01.pdf

[2]: Guo-Xun Yuan, Chia-Hua Ho and Chih-Jen Lin. An Improved GLMNET for L1-regularized Logistic Regression. Journal of Machine Learning Research, 13, 2012. http://www.jmlr.org/papers/volume13/yuan12a/yuan12a.pdf