|View source on GitHub|
A StatsGenerator which computes statistics using a combiner function.
tfdv.CombinerStatsGenerator( name: Text, schema: Optional[schema_pb2.Schema] = None ) -> None
This class computes statistics using a combiner function. It emits partial states processing a batch of examples at a time, merges the partial states, and finally computes the statistics from the merged partial state at the end.
This object mirrors a beam.CombineFn except for the add_input interface, which is expected to be defined by its sub-classes. Specifically, the generator must implement the following four methods:
Initializes an accumulator to store the partial state and returns it. create_accumulator()
Incorporates a batch of input examples (represented as an arrow RecordBatch) into the current accumulator and returns the updated accumulator. add_input(accumulator, input_record_batch)
Merge the partial states in the accumulators and returns the accumulator containing the merged state. merge_accumulators(accumulators)
Compute statistics from the partial state in the accumulator and return the result as a DatasetFeatureStatistics proto. extract_output(accumulator)
||A unique name associated with the statistics generator.|
||An optional schema for the dataset.|
add_input( accumulator: ACCTYPE, input_record_batch: pa.RecordBatch ) -> ACCTYPE
Returns result of folding a batch of inputs into accumulator.
||The current accumulator.|
An Arrow RecordBatch whose columns are features and
rows are examples. The columns are of type List
|The accumulator after updating the statistics for the batch of inputs.|
create_accumulator() -> ACCTYPE
Returns a fresh, empty accumulator.
|An empty accumulator.|
extract_output( accumulator: ACCTYPE ) -> statistics_pb2.DatasetFeatureStatistics
Returns result of converting accumulator into the output value.
||The final accumulator value.|
|A proto representing the result of this stats generator.|
merge_accumulators( accumulators: Iterable[ACCTYPE] ) -> ACCTYPE
Merges several accumulators to a single accumulator value.
||The accumulators to merge.|
|The merged accumulator.|