TensorFlow 2.0 Beta is available Learn more

tfp.stats.correlation

Sample correlation (Pearson) between observations indexed by event_axis.

tfp.stats.correlation(
    x,
    y=None,
    sample_axis=0,
    event_axis=-1,
    keepdims=False,
    name=None
)

Defined in python/stats/sample_stats.py.

Given N samples of scalar random variables X and Y, correlation may be estimated as

Corr[X, Y] := Cov[X, Y] / Sqrt(Cov[X, X] * Cov[Y, Y]),
where
Cov[X, Y] := N^{-1} sum_{n=1}^N (X_n - Xbar) Conj{(Y_n - Ybar)}
Xbar := N^{-1} sum_{n=1}^N X_n
Ybar := N^{-1} sum_{n=1}^N Y_n

Correlation is always in the interval [-1, 1], and Corr[X, X] == 1.

For vector-variate random variables X = (X1, ..., Xd), Y = (Y1, ..., Yd), one is often interested in the correlation matrix, C_{ij} := Corr[Xi, Yj].

x = tf.random_normal(shape=(100, 2, 3))
y = tf.random_normal(shape=(100, 2, 3))

# corr[i, j] is the sample correlation between x[:, i, j] and y[:, i, j].
corr = tfp.stats.correlation(x, y, sample_axis=0, event_axis=None)

# corr_matrix[i, m, n] is the sample correlation of x[:, i, m] and y[:, i, n]
corr_matrix = tfp.stats.correlation(x, y, sample_axis=0, event_axis=-1)

Notice we divide by N (the numpy default), which does not create NaN when N = 1, but is slightly biased.

Args:

  • x: A numeric Tensor holding samples.
  • y: Optional Tensor with same dtype and shape as x. Default value: None (y is effectively set to x).
  • sample_axis: Scalar or vector Tensor designating axis holding samples, or None (meaning all axis hold samples). Default value: 0 (leftmost dimension).
  • event_axis: Scalar or vector Tensor, or None (scalar events). Axis indexing random events, whose correlation we are interested in. If a vector, entries must form a contiguous block of dims. sample_axis and event_axis should not intersect. Default value: -1 (rightmost axis holds events).
  • keepdims: Boolean. Whether to keep the sample axis as singletons.
  • name: Python str name prefixed to Ops created by this function. Default value: None (i.e., 'correlation').

Returns:

  • corr: A Tensor of same dtype as the x, and rank equal to rank(x) - len(sample_axis) + 2 * len(event_axis).

Raises:

  • AssertionError: If x and y are found to have different shape.
  • ValueError: If sample_axis and event_axis are found to overlap.
  • ValueError: If event_axis is found to not be contiguous.