|View source on GitHub|
Computes statistics over input training data for example validation.
The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Beam and appropriate algorithms to scale to large datasets.
To include StatisticsGen in a TFX pipeline, configure your pipeline similar to https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_simple.py#L75.
Constructs a beam based executor.
Do( input_dict, output_dict, exec_properties )
Computes stats for each split of input using tensorflow_data_validation.
input_dict: Input dict from input key to a list of Artifacts.
- input_data: A list of 'ExamplesPath' type. This should contain both 'train' and 'eval' split.
output_dict: Output dict from output key to a list of Artifacts.
- output: A list of 'ExampleStatisticsPath' type. This should contain both 'train' and 'eval' split.
exec_properties: A dict of execution properties. Not used yet.