![]() |
Formal representation of a sparse linear regression.
Inherits From: StructuralTimeSeries
tfp.sts.SparseLinearRegression(
design_matrix, weights_prior_scale=0.1, weights_batch_shape=None, name=None
)
This model defines a time series given by a sparse linear combination of covariate time series provided in a design matrix:
observed_time_series = matmul(design_matrix, weights)
This is identical to tfp.sts.LinearRegression
, except that
SparseLinearRegression
uses a parameterization of a Horseshoe
prior [1][2] to encode the assumption that many of the weights
are zero,
i.e., many of the covariate time series are irrelevant. See the mathematical
details section below for further discussion. The prior parameterization used
by SparseLinearRegression
is more suitable for inference than that
obtained by simply passing the equivalent tfd.Horseshoe
prior to
LinearRegression
; when sparsity is desired, SparseLinearRegression
will
likely yield better results.
This component does not itself include observation noise; it defines a
deterministic distribution with mass at the point
matmul(design_matrix, weights)
. In practice, it should be combined with
observation noise from another component such as tfp.sts.Sum
, as
demonstrated below.
Examples
Given series1
, series2
as Tensors
each of shape [num_timesteps]
representing covariate time series, we create a regression model that
conditions on these covariates:
regression = tfp.sts.SparseLinearRegression(
design_matrix=tf.stack([series1, series2], axis=-1),
weights_prior_scale=0.1)
The weights_prior_scale
determines the level of sparsity; small
scales encourage the weights to be sparse. In some cases, such as when
the likelihood is iid Gaussian with known scale, the prior scale can be
analytically related to the expected number of nonzero weights [2]; however,
this is not the case in general for STS models.
If the design matrix has batch dimensions, by default the model will create a
matching batch of weights. For example, if design_matrix.shape == [
num_users, num_timesteps, num_features]
, by default the model will fit
separate weights for each user, i.e., it will internally represent
weights.shape == [num_users, num_features]
. To share weights across some or
all batch dimensions, you can manually specify the batch shape for the
weights:
# design_matrix.shape == [num_users, num_timesteps, num_features]
regression = tfp.sts.SparseLinearRegression(
design_matrix=design_matrix,
weights_batch_shape=[]) # weights.shape -> [num_features]
Mathematical Details
The basic horseshoe prior [1] is defined as a Cauchy-normal scale mixture:
scales[i] ~ HalfCauchy(loc=0, scale=1)
weights[i] ~ Normal(loc=0., scale=scales[i] * global_scale)`
The Cauchy scale parameters puts substantial mass near zero, encouraging
weights to be sparse, but their heavy tails allow weights far from zero to be
estimated without excessive shrinkage. The horseshoe can be thought of as a
continuous relaxation of a traditional 'spike-and-slab' discrete sparsity
prior, in which the latent Cauchy scale mixes between 'spike'
(scales[i] ~= 0
) and 'slab' (scales[i] >> 0
) regimes.
Following the recommendations in [2], SparseLinearRegression
implements
a horseshoe with the following adaptations:
- The Cauchy prior on
scales[i]
is represented as an InverseGamma-Normal compound. - The
global_scale
parameter is integrated out following aCauchy(0., scale=weights_prior_scale)
hyperprior, which is also represented as an InverseGamma-Normal compound. - All compound distributions are implemented using a non-centered parameterization.
The compound, non-centered representation defines the same marginal prior as the original horseshoe (up to integrating out the global scale), but allows samplers to mix more efficiently through the heavy tails; for variational inference, the compound representation implicity expands the representational power of the variational model.
Note that we do not yet implement the regularized ('Finnish') horseshoe, proposed in [2] for models with weak likelihoods, because the likelihood in STS models is typically Gaussian, where it's not clear that additional regularization is appropriate. If you need this functionality, please email tfprobability@tensorflow.org.
The full prior parameterization implemented in SparseLinearRegression
is
as follows:
# Sample global_scale from Cauchy(0, scale=weights_prior_scale).
global_scale_variance ~ InverseGamma(alpha=0.5, beta=0.5)
global_scale_noncentered ~ HalfNormal(loc=0, scale=1)
global_scale = (global_scale_noncentered *
sqrt(global_scale_variance) *
weights_prior_scale)
# Sample local_scales from Cauchy(0, 1).
local_scale_variances[i] ~ InverseGamma(alpha=0.5, beta=0.5)
local_scales_noncentered[i] ~ HalfNormal(loc=0, scale=1)
local_scales[i] = local_scales_noncentered[i] * sqrt(local_scale_variances[i])
weights[i] ~ Normal(loc=0., scale=local_scales[i] * global_scale)
References
[1]: Carvalho, C., Polson, N. and Scott, J. Handling Sparsity via the Horseshoe. AISTATS (2009). http://proceedings.mlr.press/v5/carvalho09a/carvalho09a.pdf [2]: Juho Piironen, Aki Vehtari. Sparsity information and regularization in the horseshoe and other shrinkage priors (2017). https://arxiv.org/abs/1707.01694
Args | |
---|---|
design_matrix
|
float Tensor of shape concat([batch_shape,
[num_timesteps, num_features]]) . This may also optionally be
an instance of tf.linalg.LinearOperator .
|
weights_prior_scale
|
float Tensor defining the scale of the Horseshoe
prior on regression weights. Small values encourage the weights to be
sparse. The shape must broadcast with weights_batch_shape .
Default value: 0.1 .
|
weights_batch_shape
|
if None , defaults to
design_matrix.batch_shape_tensor() . Must broadcast with the batch
shape of design_matrix .
Default value: None .
|
name
|
the name of this model component. Default value: 'SparseLinearRegression'. |
Attributes | |
---|---|
batch_shape
|
Static batch shape of models represented by this component. |
design_matrix
|
LinearOperator representing the design matrix. |
latent_size
|
Python int dimensionality of the latent space in this model.
|
name
|
Name of this model component. |
parameters
|
List of Parameter(name, prior, bijector) namedtuples for this model. |
weights_prior_scale
|
Methods
batch_shape_tensor
batch_shape_tensor()
Runtime batch shape of models represented by this component.
Returns | |
---|---|
batch_shape
|
int Tensor giving the broadcast batch shape of
all model parameters. This should match the batch shape of
derived state space models, i.e.,
self.make_state_space_model(...).batch_shape_tensor() .
|
joint_log_prob
joint_log_prob(
observed_time_series
)
Build the joint density log p(params) + log p(y|params)
as a callable.
Args | |
---|---|
observed_time_series
|
Observed Tensor trajectories of shape
sample_shape + batch_shape + [num_timesteps, 1] (the trailing
1 dimension is optional if num_timesteps > 1 ), where
batch_shape should match self.batch_shape (the broadcast batch
shape of all priors on parameters for this structural time series
model). May optionally be an instance of tfp.sts.MaskedTimeSeries ,
which includes a mask Tensor to specify timesteps with missing
observations.
|
Returns | |
---|---|
log_joint_fn
|
A function taking a Tensor argument for each model
parameter, in canonical order, and returning a Tensor log probability
of shape batch_shape . Note that, unlike tfp.Distributions
log_prob methods, the log_joint sums over the sample_shape from y,
so that sample_shape does not appear in the output log_prob. This
corresponds to viewing multiple samples in y as iid observations from a
single model, which is typically the desired behavior for parameter
inference.
|
make_state_space_model
make_state_space_model(
num_timesteps, param_vals, initial_state_prior=None, initial_step=0
)
Instantiate this model as a Distribution over specified num_timesteps
.
Args | |
---|---|
num_timesteps
|
Python int number of timesteps to model.
|
param_vals
|
a list of Tensor parameter values in order corresponding to
self.parameters , or a dict mapping from parameter names to values.
|
initial_state_prior
|
an optional Distribution instance overriding the
default prior on the model's initial state. This is used in forecasting
("today's prior is yesterday's posterior").
|
initial_step
|
optional int specifying the initial timestep to model.
This is relevant when the model contains time-varying components,
e.g., holidays or seasonality.
|
Returns | |
---|---|
dist
|
a LinearGaussianStateSpaceModel Distribution object.
|
params_to_weights
params_to_weights(
global_scale_variance, global_scale_noncentered, local_scale_variances,
local_scales_noncentered, weights_noncentered
)
Build regression weights from model parameters.
prior_sample
prior_sample(
num_timesteps, initial_step=0, params_sample_shape=(),
trajectories_sample_shape=(), seed=None
)
Sample from the joint prior over model parameters and trajectories.
Args | |
---|---|
num_timesteps
|
Scalar int Tensor number of timesteps to model.
|
initial_step
|
Optional scalar int Tensor specifying the starting
timestep.
Default value: 0.
|
params_sample_shape
|
Number of possible worlds to sample iid from the
parameter prior, or more generally, Tensor int shape to fill with
iid samples.
Default value: [] (i.e., draw a single sample and don't expand the
shape).
|
trajectories_sample_shape
|
For each sampled set of parameters, number
of trajectories to sample, or more generally, Tensor int shape to
fill with iid samples.
Default value: [] (i.e., draw a single sample and don't expand the
shape).
|
seed
|
Python int random seed.
|
Returns | |
---|---|
trajectories
|
float Tensor of shape
trajectories_sample_shape + params_sample_shape + [num_timesteps, 1]
containing all sampled trajectories.
|
param_samples
|
list of sampled parameter value Tensor s, in order
corresponding to self.parameters , each of shape
params_sample_shape + prior.batch_shape + prior.event_shape .
|