tft.PCACombiner

Class PCACombiner

Inherits From: CovarianceCombiner

Compute PCA of accumulated data using the biased covariance matrix.

__init__

__init__(
    output_dim=None,
    numpy_dtype=np.float64,
    output_shape=None
)

Store pca output dimension, and dtype for precision.

Properties

accumulator_coder

Methods

add_input

add_input(
    accumulator,
    batch_values
)

Compute sum of input cross-terms, sum of inputs, and count.

The cross terms for a numeric 1d array x are given by the set: {z_ij = x_i * x_j for all indices i and j}. This is stored as a 2d array. Since next_input is an array of 1d numeric arrays (i.e. a 2d array), matmul(transpose(next_input), next_input) will automatically sum up the cross terms of each 1d array in next_input.

Args:

  • accumulator: running sum of cross terms, input vectors, and count
  • batch_values: entries from the pipeline, which must be single element list containing a 2d array representing multiple 1d arrays

Returns:

An accumulator with next_input considered in its running list of sum_product, sum_vectors, and count of input rows.

create_accumulator

create_accumulator()

Create an accumulator with all zero entries.

extract_output

extract_output(accumulator)

Compute PCA of the accumulated data using the biased covariance matrix.

Following the covariance computation in CovarianceCombiner, this method runs eigenvalue decomposition on the covariance matrix, sorts eigenvalues in decreasing order, and returns the first output_dim corresponding eigenvectors (principal components) as a matrix.

Args:

  • accumulator: final accumulator as a list of the sum of cross-terms matrix, sum of input vectors, and count.

Returns:

A list containing a matrix of shape (input_dim, output_dim).

merge_accumulators

merge_accumulators(accumulators)

Sums values in each accumulator entry.

output_tensor_infos

output_tensor_infos()