tfdv.CombinerStatsGenerator

A StatsGenerator which computes statistics using a combiner function.

This class computes statistics using a combiner function. It emits partial states processing a batch of examples at a time, merges the partial states, and finally computes the statistics from the merged partial state at the end.

This object mirrors a beam.CombineFn except for the add_input interface, which is expected to be defined by its sub-classes. Specifically, the generator must implement the following four methods:

Initializes an accumulator to store the partial state and returns it. create_accumulator()

Incorporates a batch of input examples (represented as an arrow RecordBatch) into the current accumulator and returns the updated accumulator. add_input(accumulator, input_record_batch)

Merge the partial states in the accumulators and returns the accumulator containing the merged state. merge_accumulators(accumulators)

Compute statistics from the partial state in the accumulator and return the result as a DatasetFeatureStatistics proto. extract_output(accumulator)

name A unique name associated with the statistics generator.
schema An optional schema for the dataset.

name

schema

Methods

add_input

View source

Returns result of folding a batch of inputs into accumulator.

Args
accumulator The current accumulator.
input_record_batch An Arrow RecordBatch whose columns are features and rows are examples. The columns are of type List or Null (If a feature's value is None across all the examples in the batch, its corresponding column is of Null type).

Returns
The accumulator after updating the statistics for the batch of inputs.

create_accumulator

View source

Returns a fresh, empty accumulator.

Returns
An empty accumulator.

extract_output

View source

Returns result of converting accumulator into the output value.

Args
accumulator The final accumulator value.

Returns
A proto representing the result of this stats generator.

merge_accumulators

View source

Merges several accumulators to a single accumulator value.

Args
accumulators The accumulators to merge.

Returns
The merged accumulator.