tfx.components.transform.executor.Executor

View source on GitHub

Class Executor

Transform executor.

Inherits From: BaseExecutor

__init__

View source

__init__(context=None)

Constructs a beam based executor.

Child Classes

class Context

Methods

Do

View source

Do(
    input_dict,
    output_dict,
    exec_properties
)

TensorFlow Transform executor entrypoint.

This implements BaseExecutor.Do() and is invoked by orchestration systems. This is not inteded for manual usage or further customization. Please use the Transform() function which takes an input format with no artifact dependency.

Args:

  • input_dict: Input dict from input key to a list of artifacts, including:
    • input_data: A list of 'ExamplesPath' type which should contain two splits 'train' and 'eval'.
    • schema: A list of 'SchemaPath' type which should contain a single schema artifact.
  • output_dict: Output dict from key to a list of artifacts, including:
    • transform_output: Output of 'tf.Transform', which includes an exported Tensorflow graph suitable for both training and serving;
    • transformed_examples: Materialized transformed examples, which includes both 'train' and 'eval' splits.
  • exec_properties: A dict of execution properties, including either one of:
    • module_file: The file path to a python module file, from which the 'preprocessing_fn' function will be loaded.
    • preprocessing_fn: The module path to a python function that implements 'preprocessing_fn'.

Returns:

None

Transform

View source

Transform(
    inputs,
    outputs,
    status_file
)

Executes on request.

This is the implementation part of transform executor. This is intended for using or extending the executor without artifact dependency.

Args:

  • inputs: A dictionary of labelled input values, including:
    • labels.COMPUTE_STATISTICS_LABEL: Whether compute statistics.
    • labels.SCHEMA_PATH_LABEL: Path to schema file.
    • labels.EXAMPLES_DATA_FORMAT_LABEL: Example data format.
    • labels.ANALYZE_DATA_PATHS_LABEL: Paths or path patterns to analyze data.
    • labels.ANALYZE_PATHS_FILE_FORMATS_LABEL: File formats of paths to analyze data.
    • labels.TRANSFORM_DATA_PATHS_LABEL: Paths or path patterns to transform data.
    • labels.TRANSFORM_PATHS_FILE_FORMATS_LABEL: File formats of paths to transform data.
    • labels.TFT_STATISTICS_USE_TFDV_LABEL: Whether use tfdv to compute statistics.
    • labels.MODULE_FILE: Path to a Python module that contains the preprocessing_fn, optional.
    • labels.PREPROCESSING_FN: Path to a Python function that implements preprocessing_fn, optional.
  • outputs: A dictionary of labelled output values, including:
    • labels.PER_SET_STATS_OUTPUT_PATHS_LABEL: Paths to statistics output, optional.
    • labels.TRANSFORM_METADATA_OUTPUT_PATH_LABEL: A path to TFTransformOutput output.
    • labels.TRANSFORM_MATERIALIZE_OUTPUT_PATHS_LABEL: Paths to transform materialization.
    • labels.TEMP_OUTPUT_LABEL: A path to temporary directory.
  • status_file: Where the status should be written (not yet implemented)