tfx.v1.components.StatisticsGen

Official TFX StatisticsGen component.

Inherits From: BaseComponent, BaseNode

Used in the notebooks

Used in the tutorials

The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Apache Beam and approximate algorithms to scale to large datasets.

Example

  # Computes statistics over data for visualization and example validation.
  statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])

Component outputs contains:

Please see the StatisticsGen guide for more details.

examples A BaseChannel of ExamplesPath type, likely generated by the ExampleGen component. This needs to contain two splits labeled train and eval. required
schema A Schema channel to use for automatically configuring the value of stats options passed to TFDV.
stats_options The StatsOptions instance to configure optional TFDV behavior. When stats_options.schema is set, it will be used instead of the schema channel input. Due to the requirement that stats_options be serialized, the slicer functions and custom stats generators are not usable, and an error will be raised if either is specified.
exclude_splits Names of splits where statistics and sample should not be generated. Default behavior (when exclude_splits is set to None) is excluding no splits.

outputs Component's output channel dict.

Methods

with_beam_pipeline_args

Add per component Beam pipeline args.

Args
beam_pipeline_args List of Beam pipeline args to be added to the Beam executor spec.

Returns
the same component itself.