Transform executor.
Inherits From: BaseExecutor
tfx.components.transform.executor.Executor(
context: Optional[tfx.dsl.components.base.base_executor.BaseExecutor.Context
] = None
)
Child Classes
class Context
Methods
Do
View source
Do(
input_dict: Dict[Text, List[types.Artifact]],
output_dict: Dict[Text, List[types.Artifact]],
exec_properties: Dict[Text, Any]
) -> None
TensorFlow Transform executor entrypoint.
This implements BaseExecutor.Do() and is invoked by orchestration systems.
This is not inteded for manual usage or further customization. Please use
the Transform() function which takes an input format with no artifact
dependency.
Args |
input_dict
|
Input dict from input key to a list of artifacts, including:
- input_data: A list of type
standard_artifacts.Examples which should
contain custom splits specified in splits_config. If custom split is
not provided, this should contain two splits 'train' and 'eval'.
- schema: A list of type
standard_artifacts.Schema which should
contain a single schema artifact.
- analyzer_cache: Cache input of 'tf.Transform', where cached
information for analyzed examples from previous runs will be read.
|
output_dict
|
Output dict from key to a list of artifacts, including:
transform_output: Output of 'tf.Transform', which includes an exported
Tensorflow graph suitable for both training and serving;
transformed_examples: Materialized transformed examples, which
includes transform splits as specified in splits_config. If custom
split is not provided, this should include both 'train' and 'eval'
splits.
updated_analyzer_cache: Cache output of 'tf.Transform', where
cached information for analyzed examples will be written.
|
exec_properties
|
A dict of execution properties, including:
module_file: The file path to a python module file, from which the
'preprocessing_fn' function will be loaded.
preprocessing_fn: The module path to a python function that
implements 'preprocessing_fn'. Exactly one of 'module_file' and
'preprocessing_fn' should be set.
splits_config: A transform_pb2.SplitsConfig instance, providing splits
that should be analyzed and splits that should be transformed. Note
analyze and transform splits can have overlap. Default behavior (when
splits_config is not set) is analyze the 'train' split and transform
all splits. If splits_config is set, analyze cannot be empty.
force_tf_compat_v1: Whether to use TF in compat.v1 mode
irrespective of installed/enabled TF behaviors.
|
View source
Transform(
inputs: Mapping[Text, Any],
outputs: Mapping[Text, Any],
status_file: Text
) -> None
Executes on request.
This is the implementation part of transform executor. This is intended for
using or extending the executor without artifact dependency.
Args |
inputs
|
A dictionary of labelled input values, including:
- labels.COMPUTE_STATISTICS_LABEL: Whether compute statistics.
- labels.SCHEMA_PATH_LABEL: Path to schema file.
- labels.EXAMPLES_DATA_FORMAT_LABEL: Example data format, one of the
enums from example_gen_pb2.PayloadFormat.
- labels.ANALYZE_DATA_PATHS_LABEL: Paths or path patterns to analyze
data.
- labels.ANALYZE_PATHS_FILE_FORMATS_LABEL: File formats of paths to
analyze data.
- labels.TRANSFORM_DATA_PATHS_LABEL: Paths or path patterns to transform
data.
- labels.TRANSFORM_PATHS_FILE_FORMATS_LABEL: File formats of paths to
transform data.
- labels.MODULE_FILE: Path to a Python module that contains the
preprocessing_fn, optional.
- labels.PREPROCESSING_FN: Path to a Python function that implements
preprocessing_fn, optional.
- labels.CUSTOM_CONFIG: Dictionary of additional parameters for
preprocessing_fn, optional.
- labels.DATA_VIEW_LABEL: DataView to be used to read the Example,
optional
- labels.FORCE_TF_COMPAT_V1_LABEL: Whether to use TF in compat.v1 mode
irrespective of installed/enabled TF behaviors.
|
outputs
|
A dictionary of labelled output values, including:
labels.PER_SET_STATS_OUTPUT_PATHS_LABEL: Paths to statistics output,
optional.
labels.TRANSFORM_METADATA_OUTPUT_PATH_LABEL: A path to
TFTransformOutput output.
labels.TRANSFORM_MATERIALIZE_OUTPUT_PATHS_LABEL: Paths to transform
materialization.
labels.TEMP_OUTPUT_LABEL: A path to temporary directory.
|
status_file
|
Where the status should be written (not yet implemented)
|