Join the SIG TFX-Addons community and help make TFX even better!


A TFX component to transform the input examples.

Used in the notebooks

Used in the tutorials

The Transform component wraps TensorFlow Transform (tf.Transform) to preprocess data in a TFX pipeline. This component will load the preprocessing_fn from input module file, preprocess both 'train' and 'eval' splits of input examples, generate the tf.Transform output, and save both transform function and transformed examples to orchestrator desired locations.

Providing a preprocessing function

The TFX executor will use the estimator provided in the module_file file to train the model. The Transform executor will look specifically for the preprocessing_fn() function within that file.

An example of preprocessing_fn() can be found in the user-supplied code of the TFX Chicago Taxi pipeline example.


# Performs transformations and feature engineering in training and serving.
transform = Transform(

Component outputs contains:

  • transform_graph: Channel of type standard_artifacts.TransformGraph, which includes an exported Tensorflow graph suitable for both training and serving.
  • transformed_examples: Channel of type standard_artifacts.Examples for materialized transformed examples, which includes transform splits as specified in splits_config. This is optional controlled by materialize.

Please see the Transform guide for more details.

examples A Channel of type standard_artifacts.Examples (required). This should contain custom splits specified in splits_config. If custom split is not provided, this should contain two splits 'train' and 'eval'.
schema A Channel of type standard_artifacts.Schema. This should contain a single schema artifact.
module_file The file path to a python module file, from which the 'preprocessing_fn' function will be loaded. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied.

The function needs to have the following signature:

def preprocessing_fn(inputs: Dict[Text, Any]) -> Dict[Text, Any]:

where the values of input and returned Dict are either tf.Tensor or tf.SparseTensor.

If additional inputs are needed for preprocessing_fn, they can be passed in custom_config:

def preprocessing_fn(inputs: Dict[Text, Any], custom_config:
Dict[Text, Any]) -> Dict[Text, Any]:

Use of a RuntimeParameter for this argument is experimental.

preprocessing_fn The path to python function that implements a 'preprocessing_fn'. See 'module_file' for expected signature of the function. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied. Use of a RuntimeParameter for this argument is experimental.
splits_config A transform_pb2.SplitsConfig instance, providing splits that should be analyzed and splits that should be transformed. Note analyze and transform splits can have overlap. Default behavior (when splits_config is not set) is analyze the 'train' split and transform all splits. If splits_config is set, analyze cannot be empty.
analyzer_cache Optional input 'TransformCache' channel containing cached information from previous Transform runs. When provided, Transform will try use the cached calculation if possible.
materialize If True, write transformed examples as an output.
disable_analyzer_cache If False, Transform will use input cache if provided and write cache output. If True, analyzer_cache must not be provided.
force_tf_compat_v1 (Optional) If True and/or TF2 behaviors are disabled Transform will use Tensorflow in compat.v1 mode irrespective of installed version of Tensorflow. Defaults to False.
custom_config A dict which contains additional parameters that will be passed to preprocessing_fn.
compute_statistics Experimental. If True, invoke TFDV to compute pre-transform and post-transform statistics.

ValueError When both or neither of 'module_file' and 'preprocessing_fn' is supplied.

outputs Component's output channel dict.



Add per component Beam pipeline args.

beam_pipeline_args List of Beam pipeline args to be added to the Beam executor spec.

the same component itself.