Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tfx.components.FileBasedExampleGen

View source on GitHub

A TFX component to ingest examples from a file system.

Inherits From: BaseComponent

tfx.components.FileBasedExampleGen(
    input=None, input_config=None, output_config=None, custom_config=None,
    example_artifacts=None, custom_executor_spec=None, input_base=None,
    instance_name=None
)

The FileBasedExampleGen component is an API for getting file-based records into TFX pipelines. It consumes external files to generate examples which will be used by other internal components like StatisticsGen or Trainers. The component will also convert the input data into tf.record and generate train and eval example splits for downsteam components.

Example

from tfx.utils.dsl_utils import external_input

_taxi_root = os.path.join(os.environ['HOME'], 'taxi')
_data_root = os.path.join(_taxi_root, 'data', 'simple')
# Brings data into the pipeline or otherwise joins/converts training data.
example_gen = FileBasedExampleGen(input=external_input(_data_root))

Args:

  • input: A Channel of type standard_artifacts.ExternalArtifact, which includes one artifact whose uri is an external directory containing the data files. required
  • input_config: An example_gen_pb2.Input instance, providing input configuration. If unset, the files under input_base will be treated as a single dataset.
  • output_config: An example_gen_pb2.Output instance, providing the output configuration. If unset, default splits will be 'train' and 'eval' with size 2:1.
  • custom_config: An optional example_gen_pb2.CustomConfig instance, providing custom configuration for executor.
  • example_artifacts: Channel of 'ExamplesPath' for output train and eval examples.
  • custom_executor_spec: Optional custom executor spec overriding the default executor spec specified in the component attribute.
  • input_base: Backwards compatibility alias for the 'input' argument.
  • instance_name: Optional unique instance name. Required only if multiple ExampleGen components are declared in the same pipeline. Either input_base or input must be present in the input arguments.

Attributes:

  • component_id: DEPRECATED FUNCTION

  • component_type: DEPRECATED FUNCTION

  • downstream_nodes

  • exec_properties

  • id: Node id, unique across all TFX nodes in a pipeline.

    If instance name is available, node_id will be: . otherwise, node_id will be:

  • inputs

  • outputs

  • type

  • upstream_nodes

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Methods

add_downstream_node

View source

add_downstream_node(
    downstream_node
)

add_upstream_node

View source

add_upstream_node(
    upstream_node
)

from_json_dict

View source

@classmethod
from_json_dict(
    cls, dict_data
)

Convert from dictionary data to an object.

get_id

View source

@classmethod
get_id(
    cls, instance_name=None
)

Gets the id of a node.

This can be used during pipeline authoring time. For example: from tfx.components import Trainer

resolver = ResolverNode(..., model=Channel( type=Model, producer_component_id=Trainer.get_id('my_trainer')))

Args:

  • instance_name: (Optional) instance name of a node. If given, the instance name will be taken into consideration when generating the id.

Returns:

an id for the node.

to_json_dict

View source

to_json_dict()

Convert from an object to a JSON serializable dictionary.

Class Variables

  • EXECUTOR_SPEC