Registration is open for TensorFlow Dev Summit 2020 Learn more

tfx.components.Transform

View source on GitHub

Class Transform

A TFX component to transform the input examples.

Inherits From: BaseComponent

Aliases: tfx.components.transform.component.Transform

Used in the tutorials:

The Transform component wraps TensorFlow Transform (tf.Transform) to preprocess data in a TFX pipeline. This component will load the preprocessing_fn from input module file, preprocess both 'train' and 'eval' splits of input examples, generate the tf.Transform output, and save both transform function and transformed examples to orchestrator desired locations.

Providing a preprocessing function

The TFX executor will use the estimator provided in the module_file file to train the model. The Transform executor will look specifically for the preprocessing_fn() function within that file.

An example of preprocessing_fn() can be found in the user-supplied code of the TFX Chicago Taxi pipeline example.

Example

# Performs transformations and feature engineering in training and serving.
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    module_file=module_file)

Please see https://www.tensorflow.org/tfx/transform for more details.

__init__

View source

__init__(
    examples=None,
    schema=None,
    module_file=None,
    preprocessing_fn=None,
    transform_graph=None,
    transformed_examples=None,
    input_data=None,
    instance_name=None
)

Construct a Transform component.

Args:

  • examples: A Channel of 'ExamplesPath' type (required). This should contain the two splits 'train' and 'eval'.
  • schema: A Channel of 'SchemaPath' type. This should contain a single schema artifact.
  • module_file: The file path to a python module file, from which the 'preprocessing_fn' function will be loaded. The function must have the following signature.

    def preprocessing_fn(inputs: Dict[Text, Any]) -> Dict[Text, Any]: ...

    where the values of input and returned Dict are either tf.Tensor or tf.SparseTensor. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied.

  • preprocessing_fn: The path to python function that implements a 'preprocessing_fn'. See 'module_file' for expected signature of the function. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied.

  • transform_graph: Optional output 'TransformPath' channel for output of 'tf.Transform', which includes an exported Tensorflow graph suitable for both training and serving;

  • transformed_examples: Optional output 'ExamplesPath' channel for materialized transformed examples, which includes both 'train' and 'eval' splits.

  • input_data: Backwards compatibility alias for the 'examples' argument.

  • instance_name: Optional unique instance name. Necessary iff multiple transform components are declared in the same pipeline.

Raises:

  • ValueError: When both or neither of 'module_file' and 'preprocessing_fn' is supplied.

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Properties

component_id

DEPRECATED FUNCTION

component_type

DEPRECATED FUNCTION

downstream_nodes

exec_properties

id

Node id, unique across all TFX nodes in a pipeline.

If instance name is available, node_id will be: . otherwise, node_id will be:

Returns:

node id.

inputs

outputs

type

upstream_nodes

Methods

add_downstream_node

View source

add_downstream_node(downstream_node)

add_upstream_node

View source

add_upstream_node(upstream_node)

from_json_dict

View source

from_json_dict(
    cls,
    dict_data
)

Convert from dictionary data to an object.

to_json_dict

View source

to_json_dict()

Convert from an object to a JSON serializable dictionary.

Class Members

  • EXECUTOR_SPEC