![]() |
Computes statistics over input training data for example validation.
Inherits From: BaseExecutor
tfx.components.statistics_gen.executor.Executor(
context: Optional[tfx.dsl.components.base.base_executor.BaseExecutor.Context
] = None
)
The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Beam and appropriate algorithms to scale to large datasets.
To include StatisticsGen in a TFX pipeline, configure your pipeline similar to https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_simple.py#L75
Child Classes
Methods
Do
Do(
input_dict: Dict[Text, List[types.Artifact]],
output_dict: Dict[Text, List[types.Artifact]],
exec_properties: Dict[Text, Any]
) -> None
Computes stats for each split of input using tensorflow_data_validation.
Args | |
---|---|
input_dict
|
Input dict from input key to a list of Artifacts.
|
output_dict
|
Output dict from output key to a list of Artifacts.
standard_artifacts.ExampleStatistics . This
should contain both the 'train' and 'eval' splits.
|
exec_properties
|
A dict of execution properties.
|
Raises | |
---|---|
ValueError when a schema is provided both as an input and as part of the StatsOptions exec_property. |
Returns | |
---|---|
None |