Transform executor.

Inherits From: BaseExecutor

Child Classes

class Context



View source

TensorFlow Transform executor entrypoint.

This implements BaseExecutor.Do() and is invoked by orchestration systems. This is not inteded for manual usage or further customization. Please use the Transform() function which takes an input format with no artifact dependency.

input_dict Input dict from input key to a list of artifacts, including:

  • input_data: A list of type standard_artifacts.Examples which should contain custom splits specified in splits_config. If custom split is not provided, this should contain two splits 'train' and 'eval'.
  • schema: A list of type standard_artifacts.Schema which should contain a single schema artifact.
  • analyzer_cache: Cache input of 'tf.Transform', where cached information for analyzed examples from previous runs will be read.
output_dict Output dict from key to a list of artifacts, including:
  • transform_output: Output of 'tf.Transform', which includes an exported Tensorflow graph suitable for both training and serving;
  • transformed_examples: Materialized transformed examples, which includes transform splits as specified in splits_config. If custom split is not provided, this should include both 'train' and 'eval' splits.
  • updated_analyzer_cache: Cache output of 'tf.Transform', where cached information for analyzed examples will be written.
  • exec_properties A dict of execution properties, including:
  • module_file: The file path to a python module file, from which the 'preprocessing_fn' function will be loaded.
  • preprocessing_fn: The module path to a python function that implements 'preprocessing_fn'. Exactly one of 'module_file' and 'preprocessing_fn' should be set.
  • splits_config: A transform_pb2.SplitsConfig instance, providing splits that should be analyzed and splits that should be transformed. Note analyze and transform splits can have overlap. Default behavior (when splits_config is not set) is analyze the 'train' split and transform all splits. If splits_config is set, analyze cannot be empty.
  • Returns


    View source

    Executes on request.

    This is the implementation part of transform executor. This is intended for using or extending the executor without artifact dependency.

    inputs A dictionary of labelled input values, including:

    • labels.COMPUTE_STATISTICS_LABEL: Whether compute statistics.
    • labels.SCHEMA_PATH_LABEL: Path to schema file.
    • labels.EXAMPLES_DATA_FORMAT_LABEL: Example data format, one of the enums from example_gen_pb2.PayloadFormat.
    • labels.ANALYZE_DATA_PATHS_LABEL: Paths or path patterns to analyze data.
    • labels.ANALYZE_PATHS_FILE_FORMATS_LABEL: File formats of paths to analyze data.
    • labels.TRANSFORM_DATA_PATHS_LABEL: Paths or path patterns to transform data.
    • labels.TRANSFORM_PATHS_FILE_FORMATS_LABEL: File formats of paths to transform data.
    • labels.MODULE_FILE: Path to a Python module that contains the preprocessing_fn, optional.
    • labels.PREPROCESSING_FN: Path to a Python function that implements preprocessing_fn, optional.
    • labels.CUSTOM_CONFIG: Dictionary of additional parameters for preprocessing_fn, optional.
    • labels.DATA_VIEW_LABEL: DataView to be used to read the Example, optional
    outputs A dictionary of labelled output values, including:
  • labels.PER_SET_STATS_OUTPUT_PATHS_LABEL: Paths to statistics output, optional.
  • labels.TRANSFORM_METADATA_OUTPUT_PATH_LABEL: A path to TFTransformOutput output.
  • labels.TRANSFORM_MATERIALIZE_OUTPUT_PATHS_LABEL: Paths to transform materialization.
  • labels.TEMP_OUTPUT_LABEL: A path to temporary directory.
  • status_file Where the status should be written (not yet implemented)