Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings


View source on GitHub

Computes statistics over input training data for example validation.

Inherits From: BaseExecutor


The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Beam and appropriate algorithms to scale to large datasets.

To include StatisticsGen in a TFX pipeline, configure your pipeline similar to https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_simple.py#L75.

Child Classes

class Context



View source

    input_dict, output_dict, exec_properties

Computes stats for each split of input using tensorflow_data_validation.


  • input_dict: Input dict from input key to a list of Artifacts.
    • input_data: A list of type standard_artifacts.Examples. This should contain both 'train' and 'eval' split.
    • schema: Optionally, a list of type standard_artifacts.Schema. When the stats_options exec_property also contains a schema, this input should not be provided.
  • output_dict: Output dict from output key to a list of Artifacts.
  • exec_properties: A dict of execution properties.
    • stats_options_json: Optionally, a JSON representation of StatsOptions. When a schema is provided as an input, the StatsOptions value should not also contain a schema.


ValueError when a schema is provided both as an input and as part of the StatsOptions exec_property.