tfp.substrates.numpy.sts.MaskedTimeSeries

Named tuple encoding a time series Tensor and optional missingness mask.

Structural time series models handle missing values naturally, following the rules of conditional probability. Posterior inference can be used to impute missing values, with uncertainties. Forecasting and posterior decomposition are also supported for time series with missing values; the missing values will generally lead to corresponding higher forecast uncertainty.

All methods in the tfp.sts API that accept an observed_time_series Tensor should optionally also accept a MaskedTimeSeries instance.

The time series should be a float Tensor of shape [..., num_timesteps] or [..., num_timesteps, 1]. The is_missing mask must be either a boolean Tensor of shape [..., num_timesteps], or None. True values in is_missing denote missing (masked) observations; False denotes observed (unmasked) values. Note that these semantics are opposite that of low-level TensorFlow methods like tf.boolean_mask, but consistent with the behavior of Numpy masked arrays.

The batch dimensions of is_missing must broadcast with the batch dimensions of time_series.

A MaskedTimeSeries is just a collections.namedtuple instance, i.e., a dumb container. Although the convention for the elements is as described here, it's left to downstream methods to validate or convert the elements as required. In particular, most downstream methods will call tf.convert_to_tensor on the components. In order to prevent duplicate Tensor creation, you may (if memory is an issue) wish to ensure that the components are already Tensors, as opposed to numpy arrays or similar.

Examples

To construct a simple MaskedTimeSeries instance:

observed_time_series = tfp.sts.MaskedTimeSeries(
  time_series=tf.random.stateless_normal([3, 4, 5]),
  is_missing=[True, False, False, True, False])

Note that the mask we specified will broadcast against the batch dimensions of the time series.

For time series with missing entries specified as NaN 'magic values', you can generate a mask using tf.is_nan:

import numpy as np
from tensorflow_probability.python.internal.backend import numpy as tf
import tensorflow_probability as tfp; tfp = tfp.substrates.numpy

time_series_with_nans = [-1., 1., np.nan, 2.4, np.nan, 5]
observed_time_series = tfp.sts.MaskedTimeSeries(
  time_series=time_series_with_nans,
  is_missing=tf.is_nan(time_series_with_nans))

# Build model using observed time series to set heuristic priors.
linear_trend_model = tfp.sts.LocalLinearTrend(
  observed_time_series=observed_time_series)
model = tfp.sts.Sum([linear_trend_model],
                    observed_time_series=observed_time_series)

# Fit model to data
parameter_samples, _ = tfp.sts.fit_with_hmc(model, observed_time_series)

# Forecast
forecast_dist = tfp.sts.forecast(
  model, observed_time_series, num_steps_forecast=5)

# Impute missing values
observations_dist = tfp.sts.impute_missing_values(model, observed_time_series)
print('imputed means and stddevs: ',
      observations_dist.mean(),
      observations_dist.stddev())

time_series A namedtuple alias for field number 0
is_missing A namedtuple alias for field number 1