tfx.components.example_gen.component.FileBasedExampleGen

View source on GitHub

Class FileBasedExampleGen

A TFX component to ingest examples from a file system.

Inherits From: BaseComponent

The FileBasedExampleGen component is an API for getting file-based records into TFX pipelines. It consumes external files to generate examples which will be used by other internal components like StatisticsGen or Trainers. The component will also convert the input data into tf.record and generate train and eval example splits for downsteam components.

Example

_taxi_root = os.path.join(os.environ['HOME'], 'taxi')
_data_root = os.path.join(_taxi_root, 'data', 'simple')
# Brings data into the pipeline or otherwise joins/converts training data.
example_gen = CsvExampleGen(input_base=examples)

__init__

View source

__init__(
    input_base=None,
    input_config=None,
    output_config=None,
    custom_config=None,
    example_artifacts=None,
    custom_executor_spec=None,
    input=None,
    instance_name=None
)

Construct a FileBasedExampleGen component.

Args:

  • input_base: A Channel of 'ExternalPath' type, which includes one artifact whose uri is an external directory containing the data files. required
  • input_config: An example_gen_pb2.Input instance, providing input configuration. If unset, the files under input_base will be treated as a single dataset.
  • output_config: An example_gen_pb2.Output instance, providing the output configuration. If unset, default splits will be 'train' and 'eval' with size 2:1.
  • custom_config: An optional example_gen_pb2.CustomConfig instance, providing custom configuration for executor.
  • example_artifacts: Channel of 'ExamplesPath' for output train and eval examples.
  • custom_executor_spec: Optional custom executor spec overriding the default executor spec specified in the component attribute.
  • input: Future replacement of the 'input_base' argument.
  • instance_name: Optional unique instance name. Required only if multiple ExampleGen components are declared in the same pipeline.

Either input_base or input must be present in the input arguments.

Child Classes

class DRIVER_CLASS

class SPEC_CLASS

Properties

component_id

Component id, unique across all component instances in a pipeline.

If unique name is available, component_id will be: . otherwise, component_id will be:

Returns:

component id.

component_type

downstream_nodes

exec_properties

inputs

outputs

upstream_nodes

Methods

add_downstream_node

View source

add_downstream_node(downstream_node)

add_upstream_node

View source

add_upstream_node(upstream_node)

Class Members

  • EXECUTOR_SPEC