Join the SIG TFX-Addons community and help make TFX even better!


Computes statistics over input training data for example validation.

Inherits From: BaseExecutor

The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Beam and appropriate algorithms to scale to large datasets.

To include StatisticsGen in a TFX pipeline, configure your pipeline similar to

Child Classes

class Context



View source

Computes stats for each split of input using tensorflow_data_validation.

input_dict Input dict from input key to a list of Artifacts.

  • examples: A list of type standard_artifacts.Examples. This should contain both 'train' and 'eval' split.
  • schema: Optionally, a list of type standard_artifacts.Schema. When the stats_options exec_property also contains a schema, this input should not be provided.
output_dict Output dict from output key to a list of Artifacts.
  • statistics: A list of type standard_artifacts.ExampleStatistics. This should contain both the 'train' and 'eval' splits.
  • exec_properties A dict of execution properties.
  • stats_options_json: Optionally, a JSON representation of StatsOptions. When a schema is provided as an input, the StatsOptions value should not also contain a schema.
  • exclude_splits: JSON-serialized list of names of splits where statistics and sample should not be generated.
  • Raises
    ValueError when a schema is provided both as an input and as part of the StatsOptions exec_property.