A StatsGenerator which computes statistics using a combiner function.

This class computes statistics using a combiner function. It emits partial states processing a batch of examples at a time, merges the partial states, and finally computes the statistics from the merged partial state at the end.

This object mirrors a beam.CombineFn except for the add_input interface, which is expected to be defined by its sub-classes. Specifically, the generator must implement the following four methods:

Initializes an accumulator to store the partial state and returns it. create_accumulator()

Incorporates a batch of input examples (represented as an arrow RecordBatch) into the current accumulator and returns the updated accumulator. add_input(accumulator, input_record_batch)

Merge the partial states in the accumulators and returns the accumulator containing the merged state. merge_accumulators(accumulators)

Compute statistics from the partial state in the accumulator and return the result as a DatasetFeatureStatistics proto. extract_output(accumulator)

name A unique name associated with the statistics generator.
schema An optional schema for the dataset.





View source

Returns result of folding a batch of inputs into accumulator.

accumulator The current accumulator, which may be modified and returned for efficiency.
input_record_batch An Arrow RecordBatch whose columns are features and rows are examples. The columns are of type List or Null (If a feature's value is None across all the examples in the batch, its corresponding column is of Null type).

The accumulator after updating the statistics for the batch of inputs.


View source

Returns a compact representation of the accumulator.

This is optionally called before an accumulator is sent across the wire. The base class is a no-op. This may be overwritten by the derived class.

accumulator The accumulator to compact.

The compacted accumulator. By default is an identity.


View source

Returns a fresh, empty accumulator.

An empty accumulator.


View source

Returns result of converting accumulator into the output value.

accumulator The final accumulator value.

A proto representing the result of this stats generator.


View source

Merges several accumulators to a single accumulator value.

accumulators The accumulators to merge.

The merged accumulator.


View source

Prepares an instance for combining.

Subclasses should put costly initializations here instead of in init(), so that 1) the cost is properly recognized by Beam as setup cost (per worker) and 2) the cost is not paid at the pipeline construction time.