Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfx.components.StatisticsGen

View source on GitHub

Official TFX StatisticsGen component.

Inherits From: BaseComponent

tfx.components.StatisticsGen(
    examples=None, schema=None, stats_options=None, output=None, input_data=None,
    instance_name=None
)

Used in the notebooks

Used in the tutorials

The StatisticsGen component generates features statistics and random samples over training data, which can be used for visualization and validation. StatisticsGen uses Apache Beam and approximate algorithms to scale to large datasets.

Please see https://www.tensorflow.org/tfx/data_validation for more details.

Example

  # Computes statistics over data for visualization and example validation.
  statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])

Args:

  • examples: A Channel of ExamplesPath type, likely generated by the ExampleGen component. This needs to contain two splits labeled train and eval. required
  • schema: A Schema channel to use for automatically configuring the value of stats options passed to TFDV.
  • stats_options: The StatsOptions instance to configure optional TFDV behavior. When stats_options.schema is set, it will be used instead of the schema channel input. Due to the requirement that stats_options be serialized, the slicer functions and custom stats generators are dropped and are therefore not usable.
  • output: ExampleStatisticsPath channel for statistics of each split provided in the input examples.
  • input_data: Backwards compatibility alias for the examples argument.
  • instance_name: Optional name assigned to this specific instance of StatisticsGen. Required only if multiple StatisticsGen components are declared in the same pipeline.

Attributes:

  • component_id: DEPRECATED FUNCTION

  • component_type: DEPRECATED FUNCTION

  • downstream_nodes

  • exec_properties

  • id: Node id, unique across all TFX nodes in a pipeline.

    If instance name is available, node_id will be: . otherwise, node_id will be:

  • inputs

  • outputs

  • type

  • upstream_nodes

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Methods

add_downstream_node

View source

add_downstream_node(
    downstream_node
)

add_upstream_node

View source

add_upstream_node(
    upstream_node
)

from_json_dict

View source

@classmethod
from_json_dict(
    cls, dict_data
)

Convert from dictionary data to an object.

get_id

View source

@classmethod
get_id(
    cls, instance_name=None
)

Gets the id of a node.

This can be used during pipeline authoring time. For example: from tfx.components import Trainer

resolver = ResolverNode(..., model=Channel( type=Model, producer_component_id=Trainer.get_id('my_trainer')))

Args:

  • instance_name: (Optional) instance name of a node. If given, the instance name will be taken into consideration when generating the id.

Returns:

an id for the node.

to_json_dict

View source

to_json_dict()

Convert from an object to a JSON serializable dictionary.

Class Variables

  • EXECUTOR_SPEC