Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfx.components.Transform

View source on GitHub

A TFX component to transform the input examples.

Inherits From: BaseComponent

tfx.components.Transform(
    examples=None, schema=None, module_file=None, preprocessing_fn=None,
    transform_graph=None, transformed_examples=None, input_data=None,
    instance_name=None
)

Used in the notebooks

Used in the tutorials

The Transform component wraps TensorFlow Transform (tf.Transform) to preprocess data in a TFX pipeline. This component will load the preprocessing_fn from input module file, preprocess both 'train' and 'eval' splits of input examples, generate the tf.Transform output, and save both transform function and transformed examples to orchestrator desired locations.

Providing a preprocessing function

The TFX executor will use the estimator provided in the module_file file to train the model. The Transform executor will look specifically for the preprocessing_fn() function within that file.

An example of preprocessing_fn() can be found in the user-supplied code of the TFX Chicago Taxi pipeline example.

Example

# Performs transformations and feature engineering in training and serving.
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    module_file=module_file)

Please see https://www.tensorflow.org/tfx/transform for more details.

Args:

  • examples: A Channel of type standard_artifacts.Examples (required). This should contain the two splits 'train' and 'eval'.
  • schema: A Channel of type standard_artifacts.Schema. This should contain a single schema artifact.
  • module_file: The file path to a python module file, from which the 'preprocessing_fn' function will be loaded. The function must have the following signature.

    def preprocessing_fn(inputs: Dict[Text, Any]) -> Dict[Text, Any]: ...

    where the values of input and returned Dict are either tf.Tensor or tf.SparseTensor. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied.

  • preprocessing_fn: The path to python function that implements a 'preprocessing_fn'. See 'module_file' for expected signature of the function. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied.

  • transform_graph: Optional output 'TransformPath' channel for output of 'tf.Transform', which includes an exported Tensorflow graph suitable for both training and serving;

  • transformed_examples: Optional output 'ExamplesPath' channel for materialized transformed examples, which includes both 'train' and 'eval' splits.

  • input_data: Backwards compatibility alias for the 'examples' argument.

  • instance_name: Optional unique instance name. Necessary iff multiple transform components are declared in the same pipeline.

Attributes:

  • component_id: DEPRECATED FUNCTION

  • component_type: DEPRECATED FUNCTION

  • downstream_nodes

  • exec_properties

  • id: Node id, unique across all TFX nodes in a pipeline.

    If instance name is available, node_id will be: . otherwise, node_id will be:

  • inputs

  • outputs

  • type

  • upstream_nodes

Raises:

  • ValueError: When both or neither of 'module_file' and 'preprocessing_fn' is supplied.

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Methods

add_downstream_node

View source

add_downstream_node(
    downstream_node
)

add_upstream_node

View source

add_upstream_node(
    upstream_node
)

from_json_dict

View source

@classmethod
from_json_dict(
    cls, dict_data
)

Convert from dictionary data to an object.

get_id

View source

@classmethod
get_id(
    cls, instance_name=None
)

Gets the id of a node.

This can be used during pipeline authoring time. For example: from tfx.components import Trainer

resolver = ResolverNode(..., model=Channel( type=Model, producer_component_id=Trainer.get_id('my_trainer')))

Args:

  • instance_name: (Optional) instance name of a node. If given, the instance name will be taken into consideration when generating the id.

Returns:

an id for the node.

to_json_dict

View source

to_json_dict()

Convert from an object to a JSON serializable dictionary.

Class Variables

  • EXECUTOR_SPEC