TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tfx.components.example_gen.custom_executors.parquet_executor.Executor

View source on GitHub

Class Executor

TFX example gen executor for processing parquet format.

Inherits From: BaseExampleGenExecutor

Data type conversion:

integer types will be converted to tf.train.Feature with tf.train.Int64List. float types will be converted to tf.train.Feature with tf.train.FloatList. string types will be converted to tf.train.Feature with tf.train.BytesList and utf-8 encoding.

Note that, Single value will be converted to a list of that single value. Missing value will be converted to empty tf.train.Feature(). Parquet data might lose precision, e.g., int96.

For details, check the dict_to_example function in example_gen.utils.

Example usage:

from tfx.components.example_gen.component import FileBasedExampleGen from tfx.components.example_gen.custom_executors import parquet_executor from tfx.utils.dsl_utils import external_input

example_gen = FileBasedExampleGen( input_base=external_input(parquet_dir_path), executor_class=parquet_executor.Executor)

__init__

View source

__init__(context=None)

Constructs a beam based executor.

Child Classes

class Context

Methods

Do

View source

Do(
    input_dict,
    output_dict,
    exec_properties
)

Take input data source and generates TF Example splits.

Args:

  • input_dict: Input dict from input key to a list of Artifacts. Depends on detailed example gen implementation.
  • output_dict: Output dict from output key to a list of Artifacts.
    • examples: splits of tf examples.
  • exec_properties: A dict of execution properties. Depends on detailed example gen implementation.
    • input: JSON string of example_gen_pb2.Input instance, providing input configuration.
    • output: JSON string of example_gen_pb2.Output instance, providing output configuration.

Returns:

None

GenerateExamplesByBeam

View source

GenerateExamplesByBeam(
    pipeline,
    input_dict,
    exec_properties
)

Converts input source to TF example splits based on configs.

Custom ExampleGen executor should provide GetInputSourceToExamplePTransform for converting input split to TF Examples. Overriding this 'GenerateExamplesByBeam' method instead if complex logic is need, e.g., custom spliting logic.

Args:

  • pipeline: beam pipeline.
  • input_dict: Input dict from input key to a list of Artifacts. Depends on detailed example gen implementation.
  • exec_properties: A dict of execution properties. Depends on detailed example gen implementation.
    • input: JSON string of example_gen_pb2.Input instance, providing input configuration.
    • output: JSON string of example_gen_pb2.Output instance, providing output configuration.

Returns:

Dict of beam PCollection with split name as key, each PCollection is a single output split that contains serialized TF Examples.

GetInputSourceToExamplePTransform

View source

GetInputSourceToExamplePTransform()

Returns PTransform for parquet to TF examples.