Class PCACombiner
Inherits From: CovarianceCombiner
Compute PCA of accumulated data using the biased covariance matrix.
__init__
__init__(
output_dim=None,
numpy_dtype=np.float64,
output_shape=None
)
Store pca output dimension, and dtype for precision.
Properties
accumulator_coder
Methods
add_input
add_input(
accumulator,
batch_values
)
Compute sum of input cross-terms, sum of inputs, and count.
The cross terms for a numeric 1d array x are given by the set: {z_ij = x_i * x_j for all indices i and j}. This is stored as a 2d array. Since next_input is an array of 1d numeric arrays (i.e. a 2d array), matmul(transpose(next_input), next_input) will automatically sum up the cross terms of each 1d array in next_input.
Args:
accumulator
: running sum of cross terms, input vectors, and countbatch_values
: entries from the pipeline, which must be single element list containing a 2d array representing multiple 1d arrays
Returns:
An accumulator with next_input considered in its running list of sum_product, sum_vectors, and count of input rows.
create_accumulator
create_accumulator()
Create an accumulator with all zero entries.
extract_output
extract_output(accumulator)
Compute PCA of the accumulated data using the biased covariance matrix.
Following the covariance computation in CovarianceCombiner, this method runs eigenvalue decomposition on the covariance matrix, sorts eigenvalues in decreasing order, and returns the first output_dim corresponding eigenvectors (principal components) as a matrix.
Args:
accumulator
: final accumulator as a list of the sum of cross-terms matrix, sum of input vectors, and count.
Returns:
A list containing a matrix of shape (input_dim, output_dim).
merge_accumulators
merge_accumulators(accumulators)
Sums values in each accumulator entry.
output_tensor_infos
output_tensor_infos()