tft.CovarianceCombiner

Class CovarianceCombiner

Combines the PCollection to compute the biased covariance matrix.

__init__

__init__(
    numpy_dtype=np.float64,
    output_shape=None
)

Store the dtype for np arrays/matrices for precision.

Properties

accumulator_coder

Methods

add_input

add_input(
    accumulator,
    batch_values
)

Compute sum of input cross-terms, sum of inputs, and count.

The cross terms for a numeric 1d array x are given by the set: {z_ij = x_i * x_j for all indices i and j}. This is stored as a 2d array. Since next_input is an array of 1d numeric arrays (i.e. a 2d array), matmul(transpose(next_input), next_input) will automatically sum up the cross terms of each 1d array in next_input.

Args:

  • accumulator: running sum of cross terms, input vectors, and count
  • batch_values: entries from the pipeline, which must be single element list containing a 2d array representing multiple 1d arrays

Returns:

An accumulator with next_input considered in its running list of sum_product, sum_vectors, and count of input rows.

create_accumulator

create_accumulator()

Create an accumulator with all zero entries.

extract_output

extract_output(accumulator)

Run covariance logic on sum_product, sum of input vectors, and count.

The formula used to compute the covariance is cov(x) = E(xx^T) - uu^T, where x is the original input to the combiner, and u = mean(x). E(xx^T) is computed by dividing sum of cross terms (index 0) by count (index 2). u is computed by taking the sum of rows (index 1) and dividing by the count (index 2).

Args:

  • accumulator: final accumulator as a list of the sum of cross-terms matrix, sum of input vectors, and count.

Returns:

A list containing a single 2d ndarray, the covariance matrix.

merge_accumulators

merge_accumulators(accumulators)

Sums values in each accumulator entry.

output_tensor_infos

output_tensor_infos()