tfx.v1.components.Transform

A TFX component to transform the input examples.

Inherits From: BaseComponent, BaseNode

tfx.v1.components.Transform(
    examples: tfx.v1.types.BaseChannel,
    schema: tfx.v1.types.BaseChannel,
    module_file: Optional[Union[str, tfx.v1.dsl.experimental.RuntimeParameter]] = None,
    preprocessing_fn: Optional[Union[str, tfx.v1.dsl.experimental.RuntimeParameter]] = None,
    splits_config: Optional[tfx.v1.proto.SplitsConfig] = None,
    analyzer_cache: Optional[tfx.v1.types.BaseChannel] = None,
    materialize: bool = True,
    disable_analyzer_cache: bool = False,
    force_tf_compat_v1: bool = False,
    custom_config: Optional[Dict[str, Any]] = None,
    disable_statistics: bool = False,
    stats_options_updater_fn: Optional[str] = None
)

Used in the notebooks

Used in the tutorials
Recommending Movies: Recommender Models in TFX TFX Estimator Component Tutorial TFX Keras Component Tutorial Feature Engineering using TFX Pipeline and TensorFlow Transform

The Transform component wraps TensorFlow Transform (tf.Transform) to preprocess data in a TFX pipeline. This component will load the preprocessing_fn from input module file, preprocess both 'train' and 'eval' splits of input examples, generate the tf.Transform output, and save both transform function and transformed examples to orchestrator desired locations.

The Transform component can also invoke TFDV to compute statistics on the pre-transform and post-transform data. Invocations of TFDV take an optional StatsOptions object. To configure the StatsOptions object that is passed to TFDV for both pre-transform and post-transform statistics, users can define the optional stats_options_updater_fn within the module file.

Providing a preprocessing function

The TFX executor will use the estimator provided in the module_file file to train the model. The Transform executor will look specifically for the preprocessing_fn() function within that file.

An example of preprocessing_fn() can be found in the user-supplied code of the TFX Chicago Taxi pipeline example.

Updating StatsOptions

The Transform executor will look specifically for the stats_options_updater_fn() within the module file specified above.

An example of stats_options_updater_fn() can be found in the user-supplied code of the TFX BERT MRPC pipeline example.

Example

# Performs transformations and feature engineering in training and serving.
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    module_file=module_file)

Component outputs contains:

transform_graph: Channel of type standard_artifacts.TransformGraph, which includes an exported Tensorflow graph suitable for both training and serving.
transformed_examples: Channel of type standard_artifacts.Examples for materialized transformed examples, which includes transform splits as specified in splits_config. This is optional controlled by materialize.

Please see the Transform guide for more details.

Args
`examples`	A BaseChannel of type `standard_artifacts.Examples` (required). This should contain custom splits specified in splits_config. If custom split is not provided, this should contain two splits 'train' and 'eval'.
`schema`	A BaseChannel of type `standard_artifacts.Schema`. This should contain a single schema artifact.
`module_file`	The file path to a python module file, from which the 'preprocessing_fn' function will be loaded. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied. The function needs to have the following signature: `def preprocessing_fn(inputs: Dict[Text, Any]) -> Dict[Text, Any]: ...` where the values of input and returned Dict are either tf.Tensor or tf.SparseTensor. If additional inputs are needed for preprocessing_fn, they can be passed in custom_config: `def preprocessing_fn(inputs: Dict[Text, Any], custom_config: Dict[Text, Any]) -> Dict[Text, Any]: ...` To update the stats options used to compute the pre-transform or post-transform statistics, optionally define the 'stats-options_updater_fn' within the same module. If implemented, this function needs to have the following signature: `def stats_options_updater_fn(stats_type: tfx.components.transform .stats_options_util.StatsType, stats_options: tfdv.StatsOptions) -> tfdv.StatsOptions: ...` Use of a RuntimeParameter for this argument is experimental.
`preprocessing_fn`	The path to python function that implements a 'preprocessing_fn'. See 'module_file' for expected signature of the function. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied. Use of a RuntimeParameter for this argument is experimental.
`splits_config`	A transform_pb2.SplitsConfig instance, providing splits that should be analyzed and splits that should be transformed. Note analyze and transform splits can have overlap. Default behavior (when splits_config is not set) is analyze the 'train' split and transform all splits. If splits_config is set, analyze cannot be empty.
`analyzer_cache`	Optional input 'TransformCache' channel containing cached information from previous Transform runs. When provided, Transform will try use the cached calculation if possible.
`materialize`	If True, write transformed examples as an output.
`disable_analyzer_cache`	If False, Transform will use input cache if provided and write cache output. If True, `analyzer_cache` must not be provided.
`force_tf_compat_v1`	(Optional) If True and/or TF2 behaviors are disabled Transform will use Tensorflow in compat.v1 mode irrespective of installed version of Tensorflow. Defaults to `False`.
`custom_config`	A dict which contains additional parameters that will be passed to preprocessing_fn.
`disable_statistics`	If True, do not invoke TFDV to compute pre-transform and post-transform statistics. When statistics are computed, they will will be stored in the `pre_transform_feature_stats/` and `post_transform_feature_stats/` subfolders of the `transform_graph` export.
`stats_options_updater_fn`	The path to a python function that implements a 'stats_options_updater_fn'. See 'module_file' for expected signature of the function. 'stats_options_updater_fn' cannot be defined if 'module_file' is specified.

Raises
`ValueError`	When both or neither of 'module_file' and 'preprocessing_fn' is supplied.

Attributes
`outputs`	Component's output channel dict.

Methods

`with_beam_pipeline_args`

with_beam_pipeline_args(
    beam_pipeline_args: Iterable[Union[str, placeholder.Placeholder]]
) -> 'BaseBeamComponent'

Add per component Beam pipeline args.

Args
`beam_pipeline_args`	List of Beam pipeline args to be added to the Beam executor spec.

Returns
the same component itself.

tfx.v1.components.Transform

Used in the notebooks

Providing a preprocessing function

Updating StatsOptions

Example

Args

Raises

Attributes

Methods

with_beam_pipeline_args

`with_beam_pipeline_args`