Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfx.components.SchemaGen

View source on GitHub

A TFX SchemaGen component to generate a schema from the training data.

Inherits From: BaseComponent

tfx.components.SchemaGen(
    statistics=None, infer_feature_shape=False, output=None, stats=None,
    instance_name=None
)

Used in the notebooks

Used in the tutorials

The SchemaGen component uses TensorFlow Data Validation to generate a schema from input statistics. The following TFX libraries use the schema: - TensorFlow Data Validation - TensorFlow Transform - TensorFlow Model Analysis

In a typical TFX pipeline, the SchemaGen component generates a schema which is is consumed by the other pipeline components.

Please see https://www.tensorflow.org/tfx/data_validation for more details.

Example

  # Generates schema based on statistics files.
  infer_schema = SchemaGen(statistics=statistics_gen.outputs['statistics'])

Args:

  • statistics: A Channel of ExampleStatistics type (required if spec is not passed). This should contain at least a train split. Other splits are currently ignored. required
  • infer_feature_shape: Boolean (or RuntimeParameter) value indicating whether or not to infer the shape of features. If the feature shape is not inferred, downstream Tensorflow Transform component using the schema will parse input as tf.SparseTensor.
  • output: Output Schema channel for schema result.
  • stats: Backwards compatibility alias for the 'statistics' argument.
  • instance_name: Optional name assigned to this specific instance of SchemaGen. Required only if multiple SchemaGen components are declared in the same pipeline. Either statistics or stats must be present in the input arguments.

Attributes:

  • component_id: DEPRECATED FUNCTION

  • component_type: DEPRECATED FUNCTION

  • downstream_nodes

  • exec_properties

  • id: Node id, unique across all TFX nodes in a pipeline.

    If instance name is available, node_id will be: . otherwise, node_id will be:

  • inputs

  • outputs

  • type

  • upstream_nodes

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Methods

add_downstream_node

View source

add_downstream_node(
    downstream_node
)

add_upstream_node

View source

add_upstream_node(
    upstream_node
)

from_json_dict

View source

@classmethod
from_json_dict(
    cls, dict_data
)

Convert from dictionary data to an object.

get_id

View source

@classmethod
get_id(
    cls, instance_name=None
)

Gets the id of a node.

This can be used during pipeline authoring time. For example: from tfx.components import Trainer

resolver = ResolverNode(..., model=Channel( type=Model, producer_component_id=Trainer.get_id('my_trainer')))

Args:

  • instance_name: (Optional) instance name of a node. If given, the instance name will be taken into consideration when generating the id.

Returns:

an id for the node.

to_json_dict

View source

to_json_dict()

Convert from an object to a JSON serializable dictionary.

Class Variables

  • EXECUTOR_SPEC