tfx.components.example_gen.base_example_gen_executor.BaseExampleGenExecutor

View source on GitHub

Class BaseExampleGenExecutor

Generic TFX example gen base executor.

Inherits From: BaseExecutor

The base ExampleGen executor takes a configuration and converts external data sources to TensorFlow Examples (tf.Example).

The common configuration (defined in https://github.com/tensorflow/tfx/blob/master/tfx/proto/example_gen.proto#L44.) describes the general properties of input data and shared instructions when producing output data.

The conversion is done in GenerateExamplesByBeam as a Beam pipeline, which validates the configuration, reads the external data sources, converts the record in the input source to tf.Example if needed, and splits the examples if the output split config is given. Then the executor's Do writes the results in splits to the output path.

For simple custom ExampleGens, the details of transforming input data record(s) to a tf.Example is expected to be given in GetInputSourceToExamplePTransform, which returns a Beam PTransform with the actual implementation. For complex use cases, such as joining multiple data sources and different interpretations of the configurations, the custom ExampleGen can override GenerateExamplesByBeam.

__init__

View source

__init__(context=None)

Constructs a beam based executor.

Child Classes

class Context

Methods

Do

View source

Do(
    input_dict,
    output_dict,
    exec_properties
)

Take input data source and generates TF Example splits.

Args:

  • input_dict: Input dict from input key to a list of Artifacts. Depends on detailed example gen implementation.
  • output_dict: Output dict from output key to a list of Artifacts.
    • examples: splits of tf examples.
  • exec_properties: A dict of execution properties. Depends on detailed example gen implementation.
    • input: JSON string of example_gen_pb2.Input instance, providing input configuration.
    • output: JSON string of example_gen_pb2.Output instance, providing output configuration.

Returns:

None

GenerateExamplesByBeam

View source

GenerateExamplesByBeam(
    pipeline,
    input_dict,
    exec_properties
)

Converts input source to TF example splits based on configs.

Custom ExampleGen executor should provide GetInputSourceToExamplePTransform for converting input split to TF Examples. Overriding this 'GenerateExamplesByBeam' method instead if complex logic is need, e.g., custom spliting logic.

Args:

  • pipeline: beam pipeline.
  • input_dict: Input dict from input key to a list of Artifacts. Depends on detailed example gen implementation.
  • exec_properties: A dict of execution properties. Depends on detailed example gen implementation.
    • input: JSON string of example_gen_pb2.Input instance, providing input configuration.
    • output: JSON string of example_gen_pb2.Output instance, providing output configuration.

Returns:

Dict of beam PCollection with split name as key, each PCollection is a single output split that contains serialized TF Examples.

GetInputSourceToExamplePTransform

View source

GetInputSourceToExamplePTransform()

Returns PTransform for converting input source to TF examples.

Note that each input split will be transformed by this function separately. For complex use case, consider override 'GenerateExamplesByBeam' instead.

Here is an example PTransform: @beam.ptransform_fn @beam.typehints.with_input_types(beam.Pipeline) @beam.typehints.with_output_types(tf.train.Example) def ExamplePTransform( pipeline: beam.Pipeline, input_dict: Dict[Text, List[types.Artifact]], exec_properties: Dict[Text, Any], split_pattern: Text) -> beam.pvalue.PCollection