TensorFlow 2.0 Beta is available Learn more

tfp.sts.DynamicLinearRegressionStateSpaceModel

Class DynamicLinearRegressionStateSpaceModel

State space model for a dynamic linear regression from provided covariates.

Inherits From: LinearGaussianStateSpaceModel

Defined in python/sts/dynamic_regression.py.

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfp.distributions.LinearGaussianStateSpaceModel for details.

The dynamic linear regression model is a special case of a linear Gaussian SSM and a generalization of typical (static) linear regression. The model represents regression weights with a latent state which evolves via a Gaussian random walk:

weights[t] ~ Normal(weights[t-1], drift_scale)

The latent state (the weights) has dimension num_features, while the parameters drift_scale and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters, the initial_state_prior, and the design_matrix. num_features is determined from the last dimension of design_matrix (equivalent to the number of columns in the design matrix in linear regression).

Mathematical Details

The dynamic linear regression model implements a tfp.distributions.LinearGaussianStateSpaceModel with latent_size = num_features and observation_size = 1 following the transition model:

transition_matrix = eye(num_features)
transition_noise ~ Normal(0, diag([drift_scale]))

which implements the evolution of weights described above. The observation model is:

observation_matrix[t] = design_matrix[t]
observation_noise ~ Normal(0, observation_noise_scale)

Examples

Given series1, series2 as Tensors each of shape [num_timesteps] representing covariate time series, we create a dynamic regression model which conditions on these via the following:

dynamic_regression_ssm = DynamicLinearRegressionStateSpaceModel(
    num_timesteps=42,
    design_matrix=tf.stack([series1, series2], axis=-1),
    drift_scale=3.14,
    initial_state_prior=tfd.MultivariateNormalDiag(scale_diag=[1., 2.]),
    observation_noise_scale=1.)

y = dynamic_regression_ssm.sample()  # shape [42, 1]
lp = dynamic_regression_ssm.log_prob(y)  # scalar

Passing additional parameter and initial_state_prior dimensions constructs a batch of models, consider the following:

dynamic_regression_ssm = DynamicLinearRegressionStateSpaceModel(
    num_timesteps=42,
    design_matrix=tf.stack([series1, series2], axis=-1),
    drift_scale=[3.14, 1.],
    initial_state_prior=tfd.MultivariateNormalDiag(scale_diag=[1., 2.]),
    observation_noise_scale=[1., 2.])

y = dynamic_regression_ssm.sample(3)  # shape [3, 2, 42, 1]
lp = dynamic_regression_ssm.log_prob(y)  # shape [3, 2]

Which (effectively) constructs two independent state space models; the first with drift_scale = 3.14 and observation_noise_scale = 1., the second with drift_scale = 1. and observation_noise_scale = 2.. We then sample from each of the models three times and calculate the log probability of each of the samples under each of the models.

Similarly, it is also possible to add batch dimensions via the design_matrix.

__init__

__init__(
    num_timesteps,
    design_matrix,
    drift_scale,
    initial_state_prior,
    observation_noise_scale=0.0,
    initial_step=0,
    validate_args=False,
    allow_nan_stats=True,
    name=None
)

State space model for a dynamic linear regression.

Args:

  • num_timesteps: Scalar int Tensor number of timesteps to model with this distribution.
  • design_matrix: float Tensor of shape concat([batch_shape, [num_timesteps, num_features]]).
  • drift_scale: Scalar (any additional dimensions are treated as batch dimensions) float Tensor indicating the standard deviation of the latent state transitions.
  • initial_state_prior: instance of tfd.MultivariateNormal representing the prior distribution on latent states. Must have event shape [num_features].
  • observation_noise_scale: Scalar (any additional dimensions are treated as batch dimensions) float Tensor indicating the standard deviation of the observation noise. Default value: 0..
  • initial_step: scalar int Tensor specifying the starting timestep. Default value: 0.
  • validate_args: Python bool. Whether to validate input with asserts. If validate_args is False, and the inputs are invalid, correct behavior is not guaranteed. Default value: False.
  • allow_nan_stats: Python bool. If False, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If True, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: True.
  • name: Python str name prefixed to ops created by this class. Default value: 'DynamicLinearRegressionStateSpaceModel'.

Properties

allow_nan_stats

Python bool describing behavior when a stat is undefined.

Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution's pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student's T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.

Returns:

  • allow_nan_stats: Python bool.

batch_shape

Shape of a single sample from a single event index as a TensorShape.

May be partially defined or unknown.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

Returns:

  • batch_shape: TensorShape, possibly unknown.

drift_scale

Standard deviation of the drift in weights at each timestep.

dtype

The DType of Tensors handled by this Distribution.

event_shape

Shape of a single sample from a single batch as a TensorShape.

May be partially defined or unknown.

Returns:

  • event_shape: TensorShape, possibly unknown.

name

Name prepended to all ops created by this Distribution.

observation_noise_scale

Standard deviation of the observation noise.

parameters

Dictionary of parameters used to instantiate this Distribution.

reparameterization_type

Describes how samples from the distribution are reparameterized.

Currently this is one of the static instances tfd.FULLY_REPARAMETERIZED or tfd.NOT_REPARAMETERIZED.

Returns:

An instance of ReparameterizationType.

validate_args

Python bool indicating possibly expensive checks are enabled.

Methods

__getitem__

__getitem__(slices)

Slices the batch axes of this distribution, returning a new instance.

b = tfd.Bernoulli(logits=tf.zeros([3, 5, 7, 9]))
b.batch_shape  # => [3, 5, 7, 9]
b2 = b[:, tf.newaxis, ..., -2:, 1::2]
b2.batch_shape  # => [3, 1, 5, 2, 4]

x = tf.random.normal([5, 3, 2, 2])
cov = tf.matmul(x, x, transpose_b=True)
chol = tf.cholesky(cov)
loc = tf.random.normal([4, 1, 3, 1])
mvn = tfd.MultivariateNormalTriL(loc, chol)
mvn.batch_shape  # => [4, 5, 3]
mvn.event_shape  # => [2]
mvn2 = mvn[:, 3:, ..., ::-1, tf.newaxis]
mvn2.batch_shape  # => [4, 2, 3, 1]
mvn2.event_shape  # => [2]

Args:

  • slices: slices from the [] operator

Returns:

  • dist: A new tfd.Distribution instance with sliced parameters.

__iter__

__iter__()

backward_smoothing_pass

backward_smoothing_pass(
    filtered_means,
    filtered_covs,
    predicted_means,
    predicted_covs
)

Run the backward pass in Kalman smoother.

The backward smoothing is using Rauch, Tung and Striebel smoother as as discussed in section 18.3.2 of Kevin P. Murphy, 2012, Machine Learning: A Probabilistic Perspective, The MIT Press. The inputs are returned by forward_filter function.

Args:

  • filtered_means: Means of the per-timestep filtered marginal distributions p(zt | x{:t}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, latent_size].
  • filtered_covs: Covariances of the per-timestep filtered marginal distributions p(zt | x{:t}), as a Tensor of shape batch_shape + [num_timesteps, latent_size, latent_size].
  • predicted_means: Means of the per-timestep predictive distributions over latent states, p(z{t+1} | x{:t}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, latent_size].
  • predicted_covs: Covariances of the per-timestep predictive distributions over latent states, p(z{t+1} | x{:t}), as a Tensor of shape batch_shape + [num_timesteps, latent_size, latent_size].

Returns:

  • posterior_means: Means of the smoothed marginal distributions p(zt | x{1:T}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, latent_size], which is of the same shape as filtered_means.
  • posterior_covs: Covariances of the smoothed marginal distributions p(zt | x{1:T}), as a Tensor of shape batch_shape + [num_timesteps, latent_size, latent_size]. which is of the same shape as filtered_covs.

batch_shape_tensor

batch_shape_tensor(name='batch_shape_tensor')

Shape of a single sample from a single event index as a 1-D Tensor.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

Args:

  • name: name to give to the op

Returns:

  • batch_shape: Tensor.

cdf

cdf(
    value,
    name='cdf',
    **kwargs
)

Cumulative distribution function.

Given random variable X, the cumulative distribution function cdf is:

cdf(x) := P[X <= x]

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • cdf: a Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

copy

copy(**override_parameters_kwargs)

Creates a deep copy of the distribution.

Args:

  • **override_parameters_kwargs: String/value dictionary of initialization arguments to override with new values.

Returns:

  • distribution: A new instance of type(self) initialized from the union of self.parameters and override_parameters_kwargs, i.e., dict(self.parameters, **override_parameters_kwargs).

covariance

covariance(
    name='covariance',
    **kwargs
)

Covariance.

Covariance is (possibly) defined only for non-scalar-event distributions.

For example, for a length-k, vector-valued distribution, it is calculated as,

Cov[i, j] = Covariance(X_i, X_j) = E[(X_i - E[X_i]) (X_j - E[X_j])]

where Cov is a (batch of) k x k matrix, 0 <= (i, j) < k, and E denotes expectation.

Alternatively, for non-vector, multivariate distributions (e.g., matrix-valued, Wishart), Covariance shall return a (batch of) matrices under some vectorization of the events, i.e.,

Cov[i, j] = Covariance(Vec(X)_i, Vec(X)_j) = [as above]

where Cov is a (batch of) k' x k' matrices, 0 <= (i, j) < k' = reduce_prod(event_shape), and Vec is some function mapping indices of this distribution's event dimensions to indices of a length-k' vector.

Args:

  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • covariance: Floating-point Tensor with shape [B1, ..., Bn, k', k'] where the first n dimensions are batch coordinates and k' = reduce_prod(self.event_shape).

cross_entropy

cross_entropy(
    other,
    name='cross_entropy'
)

Computes the (Shannon) cross entropy.

Denote this distribution (self) by P and the other distribution by Q. Assuming P, Q are absolutely continuous with respect to one another and permit densities p(x) dr(x) and q(x) dr(x), (Shannon) cross entropy is defined as:

H[P, Q] = E_p[-log q(X)] = -int_F p(x) log q(x) dr(x)

where F denotes the support of the random variable X ~ P.

Args:

Returns:

  • cross_entropy: self.dtype Tensor with shape [B1, ..., Bn] representing n different calculations of (Shannon) cross entropy.

entropy

entropy(
    name='entropy',
    **kwargs
)

Shannon entropy in nats.

event_shape_tensor

event_shape_tensor(name='event_shape_tensor')

Shape of a single sample from a single batch as a 1-D int32 Tensor.

Args:

  • name: name to give to the op

Returns:

  • event_shape: Tensor.

forward_filter

forward_filter(
    x,
    mask=None
)

Run a Kalman filter over a provided sequence of outputs.

Note that the returned values filtered_means, predicted_means, and observation_means depend on the observed time series x, while the corresponding covariances are independent of the observed series; i.e., they depend only on the model itself. This means that the mean values have shape concat([sample_shape(x), batch_shape, [num_timesteps, {latent/observation}_size]]), while the covariances have shape concat[(batch_shape, [num_timesteps, {latent/observation}_size, {latent/observation}_size]]), which does not depend on the sample shape.

Args:

  • x: a float-type Tensor with rightmost dimensions [num_timesteps, observation_size] matching self.event_shape. Additional dimensions must match or be broadcastable to self.batch_shape; any further dimensions are interpreted as a sample shape.
  • mask: optional bool-type Tensor with rightmost dimension [num_timesteps]; True values specify that the value of x at that timestep is masked, i.e., not conditioned on. Additional dimensions must match or be broadcastable to self.batch_shape; any further dimensions must match or be broadcastable to the sample shape of x. Default value: None.

Returns:

  • log_likelihoods: Per-timestep log marginal likelihoods log p(x_t | x_{:t-1}) evaluated at the input x, as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps].
  • filtered_means: Means of the per-timestep filtered marginal distributions p(zt | x{:t}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, latent_size].
  • filtered_covs: Covariances of the per-timestep filtered marginal distributions p(zt | x{:t}), as a Tensor of shape sample_shape(mask) + batch_shape + [num_timesteps, latent_size, latent_size]. Note that the covariances depend only on the model and the mask, not on the data, so this may have fewer dimensions than filtered_means.
  • predicted_means: Means of the per-timestep predictive distributions over latent states, p(z{t+1} | x{:t}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, latent_size].
  • predicted_covs: Covariances of the per-timestep predictive distributions over latent states, p(z{t+1} | x{:t}), as a Tensor of shape sample_shape(mask) + batch_shape + [num_timesteps, latent_size, latent_size]. Note that the covariances depend only on the model and the mask, not on the data, so this may have fewer dimensions than predicted_means.
  • observation_means: Means of the per-timestep predictive distributions over observations, p(x{t} | x{:t-1}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, observation_size].
  • observation_covs: Covariances of the per-timestep predictive distributions over observations, p(x{t} | x{:t-1}), as a Tensor of shape sample_shape(mask) + batch_shape + [num_timesteps, observation_size, observation_size]. Note that the covariances depend only on the model and the mask, not on the data, so this may have fewer dimensions than observation_means.

is_scalar_batch

is_scalar_batch(name='is_scalar_batch')

Indicates that batch_shape == [].

Args:

  • name: Python str prepended to names of ops created by this function.

Returns:

  • is_scalar_batch: bool scalar Tensor.

is_scalar_event

is_scalar_event(name='is_scalar_event')

Indicates that event_shape == [].

Args:

  • name: Python str prepended to names of ops created by this function.

Returns:

  • is_scalar_event: bool scalar Tensor.

kl_divergence

kl_divergence(
    other,
    name='kl_divergence'
)

Computes the Kullback--Leibler divergence.

Denote this distribution (self) by p and the other distribution by q. Assuming p, q are absolutely continuous with respect to reference measure r, the KL divergence is defined as:

KL[p, q] = E_p[log(p(X)/q(X))]
         = -int_F p(x) log q(x) dr(x) + int_F p(x) log p(x) dr(x)
         = H[p, q] - H[p]

where F denotes the support of the random variable X ~ p, H[., .] denotes (Shannon) cross entropy, and H[.] denotes (Shannon) entropy.

Args:

Returns:

  • kl_divergence: self.dtype Tensor with shape [B1, ..., Bn] representing n different calculations of the Kullback-Leibler divergence.

latents_to_observations

latents_to_observations(
    latent_means,
    latent_covs
)

Push latent means and covariances forward through the observation model.

Args:

  • latent_means: float Tensor of shape [..., num_timesteps, latent_size]
  • latent_covs: float Tensor of shape [..., num_timesteps, latent_size, latent_size].

Returns:

  • observation_means: float Tensor of shape [..., num_timesteps, observation_size]
  • observation_covs: float Tensor of shape [..., num_timesteps, observation_size, observation_size]

log_cdf

log_cdf(
    value,
    name='log_cdf',
    **kwargs
)

Log cumulative distribution function.

Given random variable X, the cumulative distribution function cdf is:

log_cdf(x) := Log[ P[X <= x] ]

Often, a numerical approximation can be used for log_cdf(x) that yields a more accurate answer than simply taking the logarithm of the cdf when x << -1.

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • logcdf: a Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

log_prob

log_prob(
    value,
    name='log_prob',
    **kwargs
)

Log probability density/mass function.

Additional documentation from LinearGaussianStateSpaceModel:

kwargs:
  • mask: optional bool-type Tensor with rightmost dimension [num_timesteps]; True values specify that the value of x at that timestep is masked, i.e., not conditioned on. Additional dimensions must match or be broadcastable to self.batch_shape; any further dimensions must match or be broadcastable to the sample shape of x. Default value: None.

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • log_prob: a Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

log_survival_function

log_survival_function(
    value,
    name='log_survival_function',
    **kwargs
)

Log survival function.

Given random variable X, the survival function is defined:

log_survival_function(x) = Log[ P[X > x] ]
                         = Log[ 1 - P[X <= x] ]
                         = Log[ 1 - cdf(x) ]

Typically, different numerical approximations can be used for the log survival function, which are more accurate than 1 - cdf(x) when x >> 1.

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

mean

mean(
    name='mean',
    **kwargs
)

Mean.

mode

mode(
    name='mode',
    **kwargs
)

Mode.

param_shapes

param_shapes(
    cls,
    sample_shape,
    name='DistributionParamShapes'
)

Shapes of parameters given the desired shape of a call to sample().

This is a class method that describes what key/value arguments are required to instantiate the given Distribution so that a particular shape is returned for that instance's call to sample().

Subclasses should override class method _param_shapes.

Args:

  • sample_shape: Tensor or python list/tuple. Desired shape of a call to sample().
  • name: name to prepend ops with.

Returns:

dict of parameter name to Tensor shapes.

param_static_shapes

param_static_shapes(
    cls,
    sample_shape
)

param_shapes with static (i.e. TensorShape) shapes.

This is a class method that describes what key/value arguments are required to instantiate the given Distribution so that a particular shape is returned for that instance's call to sample(). Assumes that the sample's shape is known statically.

Subclasses should override class method _param_shapes to return constant-valued tensors when constant values are fed.

Args:

  • sample_shape: TensorShape or python list/tuple. Desired shape of a call to sample().

Returns:

dict of parameter name to TensorShape.

Raises:

  • ValueError: if sample_shape is a TensorShape and is not fully defined.

posterior_marginals

posterior_marginals(
    x,
    mask=None
)

Run a Kalman smoother to return posterior mean and cov.

Note that the returned values smoothed_means depend on the observed time series x, while the smoothed_covs are independent of the observed series; i.e., they depend only on the model itself. This means that the mean values have shape concat([sample_shape(x), batch_shape, [num_timesteps, {latent/observation}_size]]), while the covariances have shape concat[(batch_shape, [num_timesteps, {latent/observation}_size, {latent/observation}_size]]), which does not depend on the sample shape.

This function only performs smoothing. If the user wants the intermediate values, which are returned by filtering pass forward_filter, one could get it by:

(log_likelihoods,
 filtered_means, filtered_covs,
 predicted_means, predicted_covs,
 observation_means, observation_covs) = model.forward_filter(x)
smoothed_means, smoothed_covs = model.backward_smoothing_pass(x)

where x is an observation sequence.

Args:

  • x: a float-type Tensor with rightmost dimensions [num_timesteps, observation_size] matching self.event_shape. Additional dimensions must match or be broadcastable to self.batch_shape; any further dimensions are interpreted as a sample shape.
  • mask: optional bool-type Tensor with rightmost dimension [num_timesteps]; True values specify that the value of x at that timestep is masked, i.e., not conditioned on. Additional dimensions must match or be broadcastable to self.batch_shape; any further dimensions must match or be broadcastable to the sample shape of x. Default value: None.

Returns:

  • smoothed_means: Means of the per-timestep smoothed distributions over latent states, p(x{t} | x{:T}), as a Tensor of shape sample_shape(x) + batch_shape + [num_timesteps, observation_size].
  • smoothed_covs: Covariances of the per-timestep smoothed distributions over latent states, p(x{t} | x{:T}), as a Tensor of shape sample_shape(mask) + batch_shape + [num_timesteps, observation_size, observation_size]. Note that the covariances depend only on the model and the mask, not on the data, so this may have fewer dimensions than filtered_means.

prob

prob(
    value,
    name='prob',
    **kwargs
)

Probability density/mass function.

Additional documentation from LinearGaussianStateSpaceModel:

kwargs:
  • mask: optional bool-type Tensor with rightmost dimension [num_timesteps]; True values specify that the value of x at that timestep is masked, i.e., not conditioned on. Additional dimensions must match or be broadcastable to self.batch_shape; any further dimensions must match or be broadcastable to the sample shape of x. Default value: None.

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • prob: a Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

quantile

quantile(
    value,
    name='quantile',
    **kwargs
)

Quantile function. Aka "inverse cdf" or "percent point function".

Given random variable X and p in [0, 1], the quantile is:

quantile(p) := x such that P[X <= x] == p

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • quantile: a Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

sample

sample(
    sample_shape=(),
    seed=None,
    name='sample',
    **kwargs
)

Generate samples of the specified shape.

Note that a call to sample() without arguments will generate a single sample.

Args:

  • sample_shape: 0D or 1D int32 Tensor. Shape of the generated samples.
  • seed: Python integer seed for RNG
  • name: name to give to the op.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • samples: a Tensor with prepended dimensions sample_shape.

stddev

stddev(
    name='stddev',
    **kwargs
)

Standard deviation.

Standard deviation is defined as,

stddev = E[(X - E[X])**2]**0.5

where X is the random variable associated with this distribution, E denotes expectation, and stddev.shape = batch_shape + event_shape.

Args:

  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • stddev: Floating-point Tensor with shape identical to batch_shape + event_shape, i.e., the same shape as self.mean().

survival_function

survival_function(
    value,
    name='survival_function',
    **kwargs
)

Survival function.

Given random variable X, the survival function is defined:

survival_function(x) = P[X > x]
                     = 1 - P[X <= x]
                     = 1 - cdf(x).

Args:

  • value: float or double Tensor.
  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

Tensor of shape sample_shape(x) + self.batch_shape with values of type self.dtype.

variance

variance(
    name='variance',
    **kwargs
)

Variance.

Variance is defined as,

Var = E[(X - E[X])**2]

where X is the random variable associated with this distribution, E denotes expectation, and Var.shape = batch_shape + event_shape.

Args:

  • name: Python str prepended to names of ops created by this function.
  • **kwargs: Named arguments forwarded to subclass implementation.

Returns:

  • variance: Floating-point Tensor with shape identical to batch_shape + event_shape, i.e., the same shape as self.mean().