tfx.components.statistics_gen.executor.Executor

View source on GitHub

Computes statistics over input training data for example validation.

Inherits From: BaseExecutor

The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Beam and appropriate algorithms to scale to large datasets.

To include StatisticsGen in a TFX pipeline, configure your pipeline similar to https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_simple.py#L75

Child Classes

class Context

Methods

Do

View source

Computes stats for each split of input using tensorflow_data_validation.

Args
input_dict Input dict from input key to a list of Artifacts.

  • input_data: A list of type standard_artifacts.Examples. This should contain both 'train' and 'eval' split.
  • schema: Optionally, a list of type standard_artifacts.Schema. When the stats_options exec_property also contains a schema, this input should not be provided.
output_dict Output dict from output key to a list of Artifacts.
  • output: A list of type standard_artifacts.ExampleStatistics. This should contain both the 'train' and 'eval' splits.
  • exec_properties A dict of execution properties.
  • stats_options_json: Optionally, a JSON representation of StatsOptions. When a schema is provided as an input, the StatsOptions value should not also contain a schema.
  • Raises
    ValueError when a schema is provided both as an input and as part of the StatsOptions exec_property.

    Returns
    None