tfp.experimental.stats.RunningVariance

Holds metadata for and facilitates variance computation.

Inherits From: RunningCovariance

RunningVariance objects do not hold state information. That information, which includes intermediate calculations, are held in a RunningCovarianceState as returned via initialize and update method calls.

RunningVariance is meant to serve general streaming variance needs. For a specialized version that fits streaming over MCMC samples, see VarianceReducer in tfp.experimental.mcmc.

shape Python Tuple or TensorShape representing the shape of incoming samples. By default, the shape is assumed to be scalar.
dtype Dtype of incoming samples and the resulting statistics. By default, the dtype is tf.float32. Any integer dtypes will be cast to corresponding floats (i.e. tf.int32 will be cast to tf.float32), as intermediate calculations should be performing floating-point division.

Methods

finalize

View source

Finalizes running covariance computation for the state.

Args
state RunningCovarianceState that represents the current state of running statistics.
ddof Requested dynamic degrees of freedom for the covariance calculation. For example, use ddof=0 for population covariance and ddof=1 for sample covariance. Defaults to the population covariance.

Returns
covariance An estimate of the covariance.

initialize

View source

Initializes a RunningCovarianceState using previously defined metadata.

Returns
state RunningCovarianceState representing a stream of no inputs.

update

View source

Update the RunningCovarianceState with a new sample.

The update formula is from Philippe Pebay (2008) [1]. This implementation supports both batched and chunked covariance computation. A "batch" is the usual parallel computation, namely a batch of size N implies N independent covariance computations, each stepping one sample (or chunk) at a time. A "chunk" of size M implies incorporating M samples into a single covariance computation at once, which is more efficient than one by one.

To further illustrate the difference between batching and chunking, consider the following example:

# treat as 3 samples from each of 5 independent vector random variables of
# shape (2,)
sample = tf.ones((3, 5, 2))
running_cov = tfp.experimental.stats.RunningCovariance(
    (5, 2), event_ndims=1)
state = running_cov.initialize()
state = running_cov.update(state, sample, axis=0)
final_cov = running_cov.finalize(state)
final_cov.shape # (5, 2, 2)

Args
state RunningCovarianceState that represents the current state of running statistics.
new_sample Incoming sample with shape and dtype compatible with those used to form the RunningCovarianceState.
axis If chunking is desired, this is an integer that specifies the axis with chunked samples. For individual samples, set this to None. By default, samples are not chunked (axis is None).

Returns
state RunningCovarianceState with updated calculations.

References

[1]: Philippe Pebay. Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments. Technical Report SAND2008-6212, 2008. https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2008/086212.pdf