Treten Sie der SIG TFX-Addons-Community bei und helfen Sie, TFX noch besser zu machen!

Datenvalidierung mit TFX Pipeline und TensorFlow Data Validation

In diesem Notebook-basierten Tutorial erstellen und führen wir TFX-Pipelines aus, um Eingabedaten zu validieren und ein ML-Modell zu erstellen. Dieses Notebook basiert auf der TFX-Pipeline, die wir in Simple TFX Pipeline Tutorial erstellt haben . Wenn Sie dieses Tutorial noch nicht gelesen haben, sollten Sie es lesen, bevor Sie mit diesem Notizbuch fortfahren.

Die erste Aufgabe in jedem Data Science- oder ML-Projekt besteht darin, die Daten zu verstehen und zu bereinigen. Dazu gehören:

  • Verstehen der Datentypen, Verteilungen und anderer Informationen (z. B. Mittelwert oder Anzahl der eindeutigen Werte) zu jedem Feature
  • Generieren eines vorläufigen Schemas, das die Daten beschreibt
  • Identifizieren von Anomalien und fehlenden Werten in den Daten in Bezug auf das gegebene Schema

In diesem Tutorial erstellen wir zwei TFX-Pipelines.

Zuerst erstellen wir eine Pipeline, um den Datensatz zu analysieren und ein vorläufiges Schema des gegebenen Datensatzes zu generieren. Diese Pipeline wird zwei neue Komponenten enthalten, StatisticsGen und SchemaGen .

Sobald wir ein geeignetes Schema der Daten haben, erstellen wir eine Pipeline, um ein ML-Klassifizierungsmodell basierend auf der Pipeline aus dem vorherigen Tutorial zu trainieren. In dieser Pipeline verwenden wir das Schema aus der ersten Pipeline und eine neue Komponente, ExampleValidator , um die Eingabedaten zu validieren.

Die drei neuen Komponenten StatisticsGen, SchemaGen und ExampleValidator sind TFX-Komponenten für die Datenanalyse und -validierung und werden mithilfe der TensorFlow Data Validation- Bibliothek implementiert.

Bitte lesen Sie TFX-Pipelines verstehen , um mehr über verschiedene Konzepte in TFX zu erfahren.

Konfiguration

Wir müssen zuerst das TFX-Python-Paket installieren und den Datensatz herunterladen, den wir für unser Modell verwenden werden.

Upgrade-Pip

Um zu vermeiden, dass Pip in einem System aktualisiert wird, wenn es lokal ausgeführt wird, stellen Sie sicher, dass wir in Colab ausgeführt werden. Lokale Systeme können natürlich separat nachgerüstet werden.

try:
  import colab
  !pip install --upgrade pip
except:
  pass

TFX installieren

pip install -U tfx

Hast du die Laufzeit neu gestartet?

Wenn Sie Google Colab verwenden und die obige Zelle zum ersten Mal ausführen, müssen Sie die Laufzeit neu starten, indem Sie oben auf die Schaltfläche "LAUFZEIT NEU STARTEN" klicken oder das Menü "Laufzeit > Laufzeit neu starten ..." verwenden. Dies liegt an der Art und Weise, wie Colab Pakete lädt.

Überprüfen Sie die TensorFlow- und TFX-Versionen.

import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
TensorFlow version: 2.4.1
WARNING:absl:RuntimeParameter is only supported on Cloud-based DAG runner currently.
TFX version: 0.30.0

Variablen einrichten

Es gibt einige Variablen, die verwendet werden, um eine Pipeline zu definieren. Sie können diese Variablen nach Belieben anpassen. Standardmäßig wird die gesamte Ausgabe der Pipeline unter dem aktuellen Verzeichnis generiert.

import os

# We will create two pipelines. One for schema generation and one for training.
SCHEMA_PIPELINE_NAME = "penguin-tfdv-schema"
PIPELINE_NAME = "penguin-tfdv"

# Output directory to store artifacts generated from the pipeline.
SCHEMA_PIPELINE_ROOT = os.path.join('pipelines', SCHEMA_PIPELINE_NAME)
PIPELINE_ROOT = os.path.join('pipelines', PIPELINE_NAME)
# Path to a SQLite DB file to use as an MLMD storage.
SCHEMA_METADATA_PATH = os.path.join('metadata', SCHEMA_PIPELINE_NAME,
                                    'metadata.db')
METADATA_PATH = os.path.join('metadata', PIPELINE_NAME, 'metadata.db')

# Output directory where created models from the pipeline will be exported.
SERVING_MODEL_DIR = os.path.join('serving_model', PIPELINE_NAME)

from absl import logging
logging.set_verbosity(logging.INFO)  # Set default logging level.

Beispieldaten vorbereiten

Wir werden den Beispieldatensatz zur Verwendung in unserer TFX-Pipeline herunterladen. Der von uns verwendete Datensatz ist der Palmer Penguins - Datensatz , der auch in anderen TFX - Beispielen verwendet wird .

Dieses Dataset enthält vier numerische Merkmale:

  • culmen_length_mm
  • culmen_depth_mm
  • Flipper_Länge_mm
  • body_mass_g

Alle Features wurden bereits auf den Bereich [0,1] normalisiert. Wir werden ein Klassifikationsmodell bauen , die die vorhersagt species von Pinguinen.

Da die Komponente TFX ExampleGen Eingaben aus einem Verzeichnis liest, müssen wir ein Verzeichnis erstellen und den Datensatz dorthin kopieren.

import urllib.request
import tempfile

DATA_ROOT = tempfile.mkdtemp(prefix='tfx-data')  # Create a temporary directory.
_data_url = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
_data_filepath = os.path.join(DATA_ROOT, "data.csv")
urllib.request.urlretrieve(_data_url, _data_filepath)
('/tmp/tfx-dataehlez1c6/data.csv', <http.client.HTTPMessage at 0x7ffad0c96d10>)

Sehen Sie sich die CSV-Datei kurz an.

head {_data_filepath}
species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g
0,0.2545454545454545,0.6666666666666666,0.15254237288135594,0.2916666666666667
0,0.26909090909090905,0.5119047619047618,0.23728813559322035,0.3055555555555556
0,0.29818181818181805,0.5833333333333334,0.3898305084745763,0.1527777777777778
0,0.16727272727272732,0.7380952380952381,0.3559322033898305,0.20833333333333334
0,0.26181818181818167,0.892857142857143,0.3050847457627119,0.2638888888888889
0,0.24727272727272717,0.5595238095238096,0.15254237288135594,0.2569444444444444
0,0.25818181818181823,0.773809523809524,0.3898305084745763,0.5486111111111112
0,0.32727272727272727,0.5357142857142859,0.1694915254237288,0.1388888888888889
0,0.23636363636363636,0.9642857142857142,0.3220338983050847,0.3055555555555556

Sie sollten fünf Funktionsspalten sehen können. species ist einer von 0, 1 oder 2, und alle anderen Merkmale sollten Werte zwischen 0 und 1 haben. Wir werden eine TFX-Pipeline erstellen, um diesen Datensatz zu analysieren.

Generieren Sie ein vorläufiges Schema

TFX-Pipelines werden mithilfe von Python-APIs definiert. Wir erstellen eine Pipeline, um automatisch ein Schema aus den Eingabebeispielen zu generieren. Dieses Schema kann von einem Menschen überprüft und bei Bedarf angepasst werden. Sobald das Schema fertiggestellt ist, kann es für das Training und die Beispielvalidierung in späteren Aufgaben verwendet werden.

Zusätzlich zu CsvExampleGen das im Simple TFX Pipeline Tutorial verwendet wird , verwenden wir StatisticsGen und SchemaGen :

  • StatisticsGen berechnet Statistiken für das Dataset.
  • SchemaGen untersucht die Statistiken und erstellt ein erstes Datenschema.

Weitere Informationen zu diesen Komponenten finden Sie in den Anleitungen zu den einzelnen Komponenten oder in den TFX-Komponenten-Tutorials .

Schreiben Sie eine Pipeline-Definition

Wir definieren eine Funktion zum Erstellen einer TFX-Pipeline. Ein Pipeline Objekt stellt eine TFX-Pipeline dar, die mit einem von TFX unterstützten Pipeline-Orchestrierungssystemen ausgeführt werden kann.

def _create_schema_pipeline(pipeline_name: str,
                            pipeline_root: str,
                            data_root: str,
                            metadata_path: str) -> tfx.dsl.Pipeline:
  """Creates a pipeline for schema generation."""
  # Brings data into the pipeline.
  example_gen = tfx.components.CsvExampleGen(input_base=data_root)

  # NEW: Computes statistics over data for visualization and schema generation.
  statistics_gen = tfx.components.StatisticsGen(
      examples=example_gen.outputs['examples'])

  # NEW: Generates schema based on the generated statistics.
  schema_gen = tfx.components.SchemaGen(
      statistics=statistics_gen.outputs['statistics'], infer_feature_shape=True)

  components = [
      example_gen,
      statistics_gen,
      schema_gen,
  ]

  return tfx.dsl.Pipeline(
      pipeline_name=pipeline_name,
      pipeline_root=pipeline_root,
      metadata_connection_config=tfx.orchestration.metadata
      .sqlite_metadata_connection_config(metadata_path),
      components=components)

Führen Sie die Pipeline aus

Wir werden LocalDagRunner wie im vorherigen Tutorial verwenden.

tfx.orchestration.LocalDagRunner().run(
  _create_schema_pipeline(
      pipeline_name=SCHEMA_PIPELINE_NAME,
      pipeline_root=SCHEMA_PIPELINE_ROOT,
      data_root=DATA_ROOT,
      metadata_path=SCHEMA_METADATA_PATH))
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running pipeline:
 pipeline_info {
  id: "penguin-tfdv-schema"
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
      }
      id: "CsvExampleGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv-schema"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:50.209352"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv-schema.CsvExampleGen"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "examples"
        value {
          artifact_spec {
            type {
              name: "Examples"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
              properties {
                key: "version"
                value: INT
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "input_base"
        value {
          field_value {
            string_value: "/tmp/tfx-dataehlez1c6"
          }
        }
      }
      parameters {
        key: "input_config"
        value {
          field_value {
            string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"*\"\n    }\n  ]\n}"
          }
        }
      }
      parameters {
        key: "output_config"
        value {
          field_value {
            string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"
          }
        }
      }
      parameters {
        key: "output_data_format"
        value {
          field_value {
            int_value: 6
          }
        }
      }
    }
    downstream_nodes: "StatisticsGen"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.statistics_gen.component.StatisticsGen"
      }
      id: "StatisticsGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv-schema"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:50.209352"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv-schema.StatisticsGen"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "examples"
        value {
          channels {
            producer_node_query {
              id: "CsvExampleGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv-schema"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:50.209352"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv-schema.CsvExampleGen"
                }
              }
            }
            artifact_query {
              type {
                name: "Examples"
              }
            }
            output_key: "examples"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "statistics"
        value {
          artifact_spec {
            type {
              name: "ExampleStatistics"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "exclude_splits"
        value {
          field_value {
            string_value: "[]"
          }
        }
      }
    }
    upstream_nodes: "CsvExampleGen"
    downstream_nodes: "SchemaGen"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.schema_gen.component.SchemaGen"
      }
      id: "SchemaGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv-schema"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:50.209352"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv-schema.SchemaGen"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "statistics"
        value {
          channels {
            producer_node_query {
              id: "StatisticsGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv-schema"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:50.209352"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv-schema.StatisticsGen"
                }
              }
            }
            artifact_query {
              type {
                name: "ExampleStatistics"
              }
            }
            output_key: "statistics"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "schema"
        value {
          artifact_spec {
            type {
              name: "Schema"
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "exclude_splits"
        value {
          field_value {
            string_value: "[]"
          }
        }
      }
      parameters {
        key: "infer_feature_shape"
        value {
          field_value {
            int_value: 1
          }
        }
      }
    }
    upstream_nodes: "StatisticsGen"
    execution_options {
      caching_options {
      }
    }
  }
}
runtime_spec {
  pipeline_root {
    field_value {
      string_value: "pipelines/penguin-tfdv-schema"
    }
  }
  pipeline_run_id {
    field_value {
      string_value: "2021-06-02T09:12:50.209352"
    }
  }
}
execution_mode: SYNC
deployment_config {
  type_url: "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig"
  value: "\n\236\001\n\rCsvExampleGen\022\214\001\nHtype.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec\022@\n>\n<tfx.components.example_gen.csv_example_gen.executor.Executor\n\220\001\n\rStatisticsGen\022\177\nHtype.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec\0223\n1\n/tfx.components.statistics_gen.executor.Executor\n\216\001\n\tSchemaGen\022\200\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\022-\n+tfx.components.schema_gen.executor.Executor\022\230\001\n\rCsvExampleGen\022\206\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\0223\n1tfx.components.example_gen.driver.FileBasedDriver*b\n0type.googleapis.com/ml_metadata.ConnectionConfig\022.\032,\n(metadata/penguin-tfdv-schema/metadata.db\020\003"
}

INFO:absl:Using deployment config:
 executor_specs {
  key: "CsvExampleGen"
  value {
    beam_executable_spec {
      python_executor_spec {
        class_path: "tfx.components.example_gen.csv_example_gen.executor.Executor"
      }
    }
  }
}
executor_specs {
  key: "SchemaGen"
  value {
    python_class_executable_spec {
      class_path: "tfx.components.schema_gen.executor.Executor"
    }
  }
}
executor_specs {
  key: "StatisticsGen"
  value {
    beam_executable_spec {
      python_executor_spec {
        class_path: "tfx.components.statistics_gen.executor.Executor"
      }
    }
  }
}
custom_driver_specs {
  key: "CsvExampleGen"
  value {
    python_class_executable_spec {
      class_path: "tfx.components.example_gen.driver.FileBasedDriver"
    }
  }
}
metadata_connection_config {
  sqlite {
    filename_uri: "metadata/penguin-tfdv-schema/metadata.db"
    connection_mode: READWRITE_OPENCREATE
  }
}

INFO:absl:Using connection config:
 sqlite {
  filename_uri: "metadata/penguin-tfdv-schema/metadata.db"
  connection_mode: READWRITE_OPENCREATE
}

INFO:absl:Component CsvExampleGen is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
  }
  id: "CsvExampleGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:50.209352"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema.CsvExampleGen"
      }
    }
  }
}
outputs {
  outputs {
    key: "examples"
    value {
      artifact_spec {
        type {
          name: "Examples"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
          properties {
            key: "version"
            value: INT
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "input_base"
    value {
      field_value {
        string_value: "/tmp/tfx-dataehlez1c6"
      }
    }
  }
  parameters {
    key: "input_config"
    value {
      field_value {
        string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"*\"\n    }\n  ]\n}"
      }
    }
  }
  parameters {
    key: "output_config"
    value {
      field_value {
        string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"
      }
    }
  }
  parameters {
    key: "output_data_format"
    value {
      field_value {
        int_value: 6
      }
    }
  }
}
downstream_nodes: "StatisticsGen"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 1
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=1, input_dict={}, output_dict=defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "pipelines/penguin-tfdv-schema/CsvExampleGen/examples/1"
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
, artifact_type: name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}), exec_properties={'input_config': '{\n  "splits": [\n    {\n      "name": "single_split",\n      "pattern": "*"\n    }\n  ]\n}', 'output_config': '{\n  "split_config": {\n    "splits": [\n      {\n        "hash_buckets": 2,\n        "name": "train"\n      },\n      {\n        "hash_buckets": 1,\n        "name": "eval"\n      }\n    ]\n  }\n}', 'output_data_format': 6, 'input_base': '/tmp/tfx-dataehlez1c6', 'span': 0, 'version': None, 'input_fingerprint': 'split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170'}, execution_output_uri='pipelines/penguin-tfdv-schema/CsvExampleGen/.system/executor_execution/1/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv-schema/CsvExampleGen/.system/stateful_working_dir/2021-06-02T09:12:50.209352', tmp_dir='pipelines/penguin-tfdv-schema/CsvExampleGen/.system/executor_execution/1/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
  }
  id: "CsvExampleGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:50.209352"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema.CsvExampleGen"
      }
    }
  }
}
outputs {
  outputs {
    key: "examples"
    value {
      artifact_spec {
        type {
          name: "Examples"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
          properties {
            key: "version"
            value: INT
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "input_base"
    value {
      field_value {
        string_value: "/tmp/tfx-dataehlez1c6"
      }
    }
  }
  parameters {
    key: "input_config"
    value {
      field_value {
        string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"*\"\n    }\n  ]\n}"
      }
    }
  }
  parameters {
    key: "output_config"
    value {
      field_value {
        string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"
      }
    }
  }
  parameters {
    key: "output_data_format"
    value {
      field_value {
        int_value: 6
      }
    }
  }
}
downstream_nodes: "StatisticsGen"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv-schema"
, pipeline_run_id='2021-06-02T09:12:50.209352')
INFO:absl:Generating examples.
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
INFO:absl:Processing input csv data /tmp/tfx-dataehlez1c6/* to TFExample.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Examples generated.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 1 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "pipelines/penguin-tfdv-schema/CsvExampleGen/examples/1"
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}) for execution 1
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component CsvExampleGen is finished.
INFO:absl:Component StatisticsGen is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.statistics_gen.component.StatisticsGen"
  }
  id: "StatisticsGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:50.209352"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema.StatisticsGen"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "CsvExampleGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:50.209352"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema.CsvExampleGen"
            }
          }
        }
        artifact_query {
          type {
            name: "Examples"
          }
        }
        output_key: "examples"
      }
    }
  }
}
outputs {
  outputs {
    key: "statistics"
    value {
      artifact_spec {
        type {
          name: "ExampleStatistics"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
}
upstream_nodes: "CsvExampleGen"
downstream_nodes: "SchemaGen"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 2
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=2, input_dict={'examples': [Artifact(artifact: id: 1
type_id: 6
uri: "pipelines/penguin-tfdv-schema/CsvExampleGen/examples/1"
properties {
  key: "split_names"
  value {
    string_value: "[\"train\", \"eval\"]"
  }
}
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "payload_format"
  value {
    string_value: "FORMAT_TF_EXAMPLE"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
state: LIVE
create_time_since_epoch: 1622625171427
last_update_time_since_epoch: 1622625171427
, artifact_type: id: 6
name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}, output_dict=defaultdict(<class 'list'>, {'statistics': [Artifact(artifact: uri: "pipelines/penguin-tfdv-schema/StatisticsGen/statistics/2"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:StatisticsGen:statistics:0"
  }
}
, artifact_type: name: "ExampleStatistics"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}), exec_properties={'exclude_splits': '[]'}, execution_output_uri='pipelines/penguin-tfdv-schema/StatisticsGen/.system/executor_execution/2/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv-schema/StatisticsGen/.system/stateful_working_dir/2021-06-02T09:12:50.209352', tmp_dir='pipelines/penguin-tfdv-schema/StatisticsGen/.system/executor_execution/2/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.statistics_gen.component.StatisticsGen"
  }
  id: "StatisticsGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:50.209352"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema.StatisticsGen"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "CsvExampleGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:50.209352"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema.CsvExampleGen"
            }
          }
        }
        artifact_query {
          type {
            name: "Examples"
          }
        }
        output_key: "examples"
      }
    }
  }
}
outputs {
  outputs {
    key: "statistics"
    value {
      artifact_spec {
        type {
          name: "ExampleStatistics"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
}
upstream_nodes: "CsvExampleGen"
downstream_nodes: "SchemaGen"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv-schema"
, pipeline_run_id='2021-06-02T09:12:50.209352')
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to pipelines/penguin-tfdv-schema/StatisticsGen/statistics/2/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to pipelines/penguin-tfdv-schema/StatisticsGen/statistics/2/Split-eval.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 2 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'statistics': [Artifact(artifact: uri: "pipelines/penguin-tfdv-schema/StatisticsGen/statistics/2"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:StatisticsGen:statistics:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "ExampleStatistics"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}) for execution 2
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component StatisticsGen is finished.
INFO:absl:Component SchemaGen is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.schema_gen.component.SchemaGen"
  }
  id: "SchemaGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:50.209352"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema.SchemaGen"
      }
    }
  }
}
inputs {
  inputs {
    key: "statistics"
    value {
      channels {
        producer_node_query {
          id: "StatisticsGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:50.209352"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema.StatisticsGen"
            }
          }
        }
        artifact_query {
          type {
            name: "ExampleStatistics"
          }
        }
        output_key: "statistics"
      }
    }
  }
}
outputs {
  outputs {
    key: "schema"
    value {
      artifact_spec {
        type {
          name: "Schema"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
  parameters {
    key: "infer_feature_shape"
    value {
      field_value {
        int_value: 1
      }
    }
  }
}
upstream_nodes: "StatisticsGen"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 3
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=3, input_dict={'statistics': [Artifact(artifact: id: 2
type_id: 8
uri: "pipelines/penguin-tfdv-schema/StatisticsGen/statistics/2"
properties {
  key: "split_names"
  value {
    string_value: "[\"train\", \"eval\"]"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:StatisticsGen:statistics:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
state: LIVE
create_time_since_epoch: 1622625173398
last_update_time_since_epoch: 1622625173398
, artifact_type: id: 8
name: "ExampleStatistics"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}, output_dict=defaultdict(<class 'list'>, {'schema': [Artifact(artifact: uri: "pipelines/penguin-tfdv-schema/SchemaGen/schema/3"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:SchemaGen:schema:0"
  }
}
, artifact_type: name: "Schema"
)]}), exec_properties={'infer_feature_shape': 1, 'exclude_splits': '[]'}, execution_output_uri='pipelines/penguin-tfdv-schema/SchemaGen/.system/executor_execution/3/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv-schema/SchemaGen/.system/stateful_working_dir/2021-06-02T09:12:50.209352', tmp_dir='pipelines/penguin-tfdv-schema/SchemaGen/.system/executor_execution/3/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.schema_gen.component.SchemaGen"
  }
  id: "SchemaGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:50.209352"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv-schema.SchemaGen"
      }
    }
  }
}
inputs {
  inputs {
    key: "statistics"
    value {
      channels {
        producer_node_query {
          id: "StatisticsGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:50.209352"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv-schema.StatisticsGen"
            }
          }
        }
        artifact_query {
          type {
            name: "ExampleStatistics"
          }
        }
        output_key: "statistics"
      }
    }
  }
}
outputs {
  outputs {
    key: "schema"
    value {
      artifact_spec {
        type {
          name: "Schema"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
  parameters {
    key: "infer_feature_shape"
    value {
      field_value {
        int_value: 1
      }
    }
  }
}
upstream_nodes: "StatisticsGen"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv-schema"
, pipeline_run_id='2021-06-02T09:12:50.209352')
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/tmp/tmpb83ytk8y.json', '--HistoryManager.hist_file=:memory:']
INFO:absl:Attempting to infer TFX Python dependency for beam
INFO:absl:Copying all content from install dir /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx to temp dir /tmp/tmpkag3zvkj/build/tfx
INFO:absl:Generating a temp setup file at /tmp/tmpkag3zvkj/build/tfx/setup.py
INFO:absl:Creating temporary sdist package, logs available at /tmp/tmpkag3zvkj/build/tfx/setup.log
INFO:absl:Added --extra_package=/tmp/tmpkag3zvkj/build/tfx/dist/tfx_ephemeral-0.30.0.tar.gz to beam args
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to pipelines/penguin-tfdv-schema/SchemaGen/schema/3/schema.pbtxt.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 3 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'schema': [Artifact(artifact: uri: "pipelines/penguin-tfdv-schema/SchemaGen/schema/3"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv-schema:2021-06-02T09:12:50.209352:SchemaGen:schema:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "Schema"
)]}) for execution 3
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component SchemaGen is finished.

Sie sollten "INFO:absl:Component SchemaGen is done." sehen. wenn die Pipeline erfolgreich beendet wurde.

Wir werden die Ausgabe der Pipeline untersuchen, um unseren Datensatz zu verstehen.

Überprüfen Sie die Ergebnisse der Pipeline

Wie im vorherigen Tutorial erläutert, erzeugt eine TFX-Pipeline zwei Arten von Ausgaben, Artefakte und eine Metadaten-DB (MLMD), die Metadaten von Artefakten und Pipelineausführungen enthält. Wir haben die Position dieser Ausgaben in den obigen Zellen definiert. Standardmäßig werden die Artefakte unter dem gespeicherten pipelines Verzeichnis und Metadaten als SQLite - Datenbank unter dem gespeicherten metadata - Verzeichnis.

Sie können MLMD-APIs verwenden, um diese Ausgaben programmgesteuert zu finden. Zuerst werden wir einige Hilfsfunktionen definieren, um nach den soeben erzeugten Ausgabeartefakten zu suchen.

from ml_metadata.proto import metadata_store_pb2
# Non-public APIs, just for showcase.
from tfx.orchestration.portable.mlmd import execution_lib

# TODO(b/171447278): Move these functions into the TFX library.

def get_latest_artifacts(metadata, pipeline_name, component_id):
  """Output artifacts of the latest run of the component."""
  context = metadata.store.get_context_by_type_and_name(
      'node', f'{pipeline_name}.{component_id}')
  executions = metadata.store.get_executions_by_context(context.id)
  latest_execution = max(executions,
                         key=lambda e:e.last_update_time_since_epoch)
  return execution_lib.get_artifacts_dict(metadata, latest_execution.id, 
                                          metadata_store_pb2.Event.OUTPUT)

# Non-public APIs, just for showcase.
from tfx.orchestration.experimental.interactive import visualizations

def visualize_artifacts(artifacts):
  """Visualizes artifacts using standard visualization modules."""
  for artifact in artifacts:
    visualization = visualizations.get_registry().get_visualization(
        artifact.type_name)
    if visualization:
      visualization.display(artifact)

from tfx.orchestration.experimental.interactive import standard_visualizations
standard_visualizations.register_standard_visualizations()

Jetzt können wir die Ausgaben der Pipeline-Ausführung untersuchen.

# Non-public APIs, just for showcase.
from tfx.orchestration.metadata import Metadata
from tfx.types import standard_component_specs

metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(
    SCHEMA_METADATA_PATH)

with Metadata(metadata_connection_config) as metadata_handler:
  # Find output artifacts from MLMD.
  stat_gen_output = get_latest_artifacts(metadata_handler, SCHEMA_PIPELINE_NAME,
                                         'StatisticsGen')
  stats_artifacts = stat_gen_output[standard_component_specs.STATISTICS_KEY]

  schema_gen_output = get_latest_artifacts(metadata_handler,
                                           SCHEMA_PIPELINE_NAME, 'SchemaGen')
  schema_artifacts = schema_gen_output[standard_component_specs.SCHEMA_KEY]
INFO:absl:MetadataStore with DB connection initialized

Es ist an der Zeit, die Ausgaben jeder Komponente zu untersuchen. Wie oben beschrieben, wird Tensorflow Data Validation (TFDV) in StatisticsGen und SchemaGen , und TFDV bietet auch eine Visualisierung der Ausgaben dieser Komponenten.

In diesem Tutorial verwenden wir die Visualisierungshilfsmethoden in TFX, die intern TFDV verwenden, um die Visualisierung anzuzeigen.

Untersuchen Sie die Ausgabe von StatisticsGen

visualize_artifacts(stats_artifacts)

Sie können verschiedene Statistiken für die Eingabedaten sehen. Diese Statistiken werden an SchemaGen geliefert, um SchemaGen ein anfängliches Datenschema zu SchemaGen .

Untersuchen Sie die Ausgabe von SchemaGen

visualize_artifacts(schema_artifacts)

Dieses Schema wird automatisch aus der Ausgabe von StatisticsGen abgeleitet. Sie sollten 4 FLOAT-Features und 1 INT-Feature sehen können.

Exportieren Sie das Schema für die zukünftige Verwendung

Wir müssen das generierte Schema überprüfen und verfeinern. Das überprüfte Schema muss beibehalten werden, um in nachfolgenden Pipelines für das ML-Modelltraining verwendet zu werden. Mit anderen Worten, Sie möchten die Schemadatei möglicherweise Ihrem Versionskontrollsystem für tatsächliche Anwendungsfälle hinzufügen. In diesem Tutorial kopieren wir das Schema der Einfachheit halber einfach in einen vordefinierten Dateisystempfad.

import shutil

_schema_filename = 'schema.pbtxt'
SCHEMA_PATH = 'schema'

os.makedirs(SCHEMA_PATH, exist_ok=True)
_generated_path = os.path.join(schema_artifacts[0].uri, _schema_filename)

# Copy the 'schema.pbtxt' file from the artifact uri to a predefined path.
shutil.copy(_generated_path, SCHEMA_PATH)
'schema/schema.pbtxt'

Die Schemadatei verwendet das Protokollpuffer-Textformat und eine Instanz von TensorFlow Metadata Schema proto .

print(f'Schema at {SCHEMA_PATH}-----')
!cat {SCHEMA_PATH}/*
Schema at schema-----
feature {
  name: "body_mass_g"
  type: FLOAT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}
feature {
  name: "culmen_depth_mm"
  type: FLOAT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}
feature {
  name: "culmen_length_mm"
  type: FLOAT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}
feature {
  name: "flipper_length_mm"
  type: FLOAT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}
feature {
  name: "species"
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}

Sie sollten die Schemadefinition unbedingt überprüfen und ggf. bearbeiten. In diesem Tutorial verwenden wir einfach das generierte Schema unverändert.

Eingabebeispiele validieren und ein ML-Modell trainieren

Wir kehren zu der Pipeline zurück, die wir im Simple TFX Pipeline Tutorial erstellt haben , um ein ML-Modell zu trainieren und das generierte Schema zum Schreiben des Modelltrainingscodes zu verwenden.

Wir werden auch eine ExampleValidator- Komponente hinzufügen, die im eingehenden Dataset nach Anomalien und fehlenden Werten in Bezug auf das Schema sucht.

Modelltrainingscode schreiben

Wir müssen den Modellcode wie in Simple TFX Pipeline Tutorial schreiben.

Das Modell selbst ist das gleiche wie im vorherigen Tutorial, aber diesmal verwenden wir das aus der vorherigen Pipeline generierte Schema, anstatt Features manuell anzugeben. Der größte Teil des Codes wurde nicht geändert. Der einzige Unterschied besteht darin, dass wir die Namen und Typen von Features in dieser Datei nicht angeben müssen. Stattdessen lesen wir sie aus der Schemadatei .

_trainer_module_file = 'penguin_trainer.py'
%%writefile {_trainer_module_file}

from typing import List
from absl import logging
import tensorflow as tf
from tensorflow import keras
from tensorflow_transform.tf_metadata import schema_utils

from tfx import v1 as tfx
from tfx_bsl.public import tfxio
from tensorflow_metadata.proto.v0 import schema_pb2

# We don't need to specify _FEATURE_KEYS and _FEATURE_SPEC any more.
# Those information can be read from the given schema file.

_LABEL_KEY = 'species'

_TRAIN_BATCH_SIZE = 20
_EVAL_BATCH_SIZE = 10

def _input_fn(file_pattern: List[str],
              data_accessor: tfx.components.DataAccessor,
              schema: schema_pb2.Schema,
              batch_size: int = 200) -> tf.data.Dataset:
  """Generates features and label for training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    schema: schema of the input data.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY),
      schema=schema).repeat()


def _build_keras_model(schema: schema_pb2.Schema) -> tf.keras.Model:
  """Creates a DNN Keras model for classifying penguin data.

  Returns:
    A Keras Model.
  """
  # The model below is built with Functional API, please refer to
  # https://www.tensorflow.org/guide/keras/overview for all API options.

  # ++ Changed code: Uses all features in the schema except the label.
  feature_keys = [f.name for f in schema.feature if f.name != _LABEL_KEY]
  inputs = [keras.layers.Input(shape=(1,), name=f) for f in feature_keys]
  # ++ End of the changed code.

  d = keras.layers.concatenate(inputs)
  for _ in range(2):
    d = keras.layers.Dense(8, activation='relu')(d)
  outputs = keras.layers.Dense(3)(d)

  model = keras.Model(inputs=inputs, outputs=outputs)
  model.compile(
      optimizer=keras.optimizers.Adam(1e-2),
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[keras.metrics.SparseCategoricalAccuracy()])

  model.summary(print_fn=logging.info)
  return model


# TFX Trainer will call this function.
def run_fn(fn_args: tfx.components.FnArgs):
  """Train the model based on given args.

  Args:
    fn_args: Holds args used to train the model as name/value pairs.
  """

  # ++ Changed code: Reads in schema file passed to the Trainer component.
  schema = tfx.utils.parse_pbtxt_file(fn_args.schema_path, schema_pb2.Schema())
  # ++ End of the changed code.

  train_dataset = _input_fn(
      fn_args.train_files,
      fn_args.data_accessor,
      schema,
      batch_size=_TRAIN_BATCH_SIZE)
  eval_dataset = _input_fn(
      fn_args.eval_files,
      fn_args.data_accessor,
      schema,
      batch_size=_EVAL_BATCH_SIZE)

  model = _build_keras_model(schema)
  model.fit(
      train_dataset,
      steps_per_epoch=fn_args.train_steps,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps)

  # The result of the training should be saved in `fn_args.serving_model_dir`
  # directory.
  model.save(fn_args.serving_model_dir, save_format='tf')
Writing penguin_trainer.py

Jetzt haben Sie alle Vorbereitungsschritte zum Erstellen einer TFX-Pipeline für das Modelltraining abgeschlossen.

Schreiben Sie eine Pipeline-Definition

Wir werden zwei neue Komponenten hinzufügen, Importer und ExampleValidator . Importer bringt eine externe Datei in die TFX-Pipeline. In diesem Fall handelt es sich um eine Datei, die eine Schemadefinition enthält. ExampleValidator untersucht die Eingabedaten und überprüft, ob alle Eingabedaten dem von uns bereitgestellten Datenschema entsprechen.

def _create_pipeline(pipeline_name: str, pipeline_root: str, data_root: str,
                     schema_path: str, module_file: str, serving_model_dir: str,
                     metadata_path: str) -> tfx.dsl.Pipeline:
  """Creates a pipeline using predefined schema with TFX."""
  # Brings data into the pipeline.
  example_gen = tfx.components.CsvExampleGen(input_base=data_root)

  # Computes statistics over data for visualization and example validation.
  statistics_gen = tfx.components.StatisticsGen(
      examples=example_gen.outputs['examples'])

  # NEW: Import the schema.
  schema_importer = tfx.dsl.Importer(
      source_uri=schema_path,
      artifact_type=tfx.types.standard_artifacts.Schema).with_id(
          'schema_importer')

  # NEW: Performs anomaly detection based on statistics and data schema.
  example_validator = tfx.components.ExampleValidator(
      statistics=statistics_gen.outputs['statistics'],
      schema=schema_importer.outputs['result'])

  # Uses user-provided Python function that trains a model.
  trainer = tfx.components.Trainer(
      module_file=module_file,
      examples=example_gen.outputs['examples'],
      schema=schema_importer.outputs['result'],  # Pass the imported schema.
      train_args=tfx.proto.TrainArgs(num_steps=100),
      eval_args=tfx.proto.EvalArgs(num_steps=5))

  # Pushes the model to a filesystem destination.
  pusher = tfx.components.Pusher(
      model=trainer.outputs['model'],
      push_destination=tfx.proto.PushDestination(
          filesystem=tfx.proto.PushDestination.Filesystem(
              base_directory=serving_model_dir)))

  components = [
      example_gen,

      # NEW: Following three components were added to the pipeline.
      statistics_gen,
      schema_importer,
      example_validator,

      trainer,
      pusher,
  ]

  return tfx.dsl.Pipeline(
      pipeline_name=pipeline_name,
      pipeline_root=pipeline_root,
      metadata_connection_config=tfx.orchestration.metadata
      .sqlite_metadata_connection_config(metadata_path),
      components=components)

Führen Sie die Pipeline aus

tfx.orchestration.LocalDagRunner().run(
  _create_pipeline(
      pipeline_name=PIPELINE_NAME,
      pipeline_root=PIPELINE_ROOT,
      data_root=DATA_ROOT,
      schema_path=SCHEMA_PATH,
      module_file=_trainer_module_file,
      serving_model_dir=SERVING_MODEL_DIR,
      metadata_path=METADATA_PATH))
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/penguin_trainer.py' (including modules: ['penguin_trainer']).
INFO:absl:User module package has hash fingerprint version 000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmp/tmpqzsnsqkg/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmp6ohfkiv2', '--dist-dir', '/tmp/tmpovmeeauf']
INFO:absl:Successfully built user code wheel distribution at 'pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl'; target user module is 'penguin_trainer'.
INFO:absl:Full user module path is 'penguin_trainer@pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl'
INFO:absl:Running pipeline:
 pipeline_info {
  id: "penguin-tfdv"
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
      }
      id: "CsvExampleGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:55.331585"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv.CsvExampleGen"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "examples"
        value {
          artifact_spec {
            type {
              name: "Examples"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
              properties {
                key: "version"
                value: INT
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "input_base"
        value {
          field_value {
            string_value: "/tmp/tfx-dataehlez1c6"
          }
        }
      }
      parameters {
        key: "input_config"
        value {
          field_value {
            string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"*\"\n    }\n  ]\n}"
          }
        }
      }
      parameters {
        key: "output_config"
        value {
          field_value {
            string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"
          }
        }
      }
      parameters {
        key: "output_data_format"
        value {
          field_value {
            int_value: 6
          }
        }
      }
    }
    downstream_nodes: "StatisticsGen"
    downstream_nodes: "Trainer"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.dsl.components.common.importer.Importer"
      }
      id: "schema_importer"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:55.331585"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv.schema_importer"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "result"
        value {
          artifact_spec {
            type {
              name: "Schema"
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "artifact_uri"
        value {
          field_value {
            string_value: "schema"
          }
        }
      }
      parameters {
        key: "reimport"
        value {
          field_value {
            int_value: 0
          }
        }
      }
    }
    downstream_nodes: "ExampleValidator"
    downstream_nodes: "Trainer"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.statistics_gen.component.StatisticsGen"
      }
      id: "StatisticsGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:55.331585"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv.StatisticsGen"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "examples"
        value {
          channels {
            producer_node_query {
              id: "CsvExampleGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:55.331585"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv.CsvExampleGen"
                }
              }
            }
            artifact_query {
              type {
                name: "Examples"
              }
            }
            output_key: "examples"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "statistics"
        value {
          artifact_spec {
            type {
              name: "ExampleStatistics"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "exclude_splits"
        value {
          field_value {
            string_value: "[]"
          }
        }
      }
    }
    upstream_nodes: "CsvExampleGen"
    downstream_nodes: "ExampleValidator"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.trainer.component.Trainer"
      }
      id: "Trainer"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:55.331585"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv.Trainer"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "examples"
        value {
          channels {
            producer_node_query {
              id: "CsvExampleGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:55.331585"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv.CsvExampleGen"
                }
              }
            }
            artifact_query {
              type {
                name: "Examples"
              }
            }
            output_key: "examples"
          }
        }
      }
      inputs {
        key: "schema"
        value {
          channels {
            producer_node_query {
              id: "schema_importer"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:55.331585"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv.schema_importer"
                }
              }
            }
            artifact_query {
              type {
                name: "Schema"
              }
            }
            output_key: "result"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "model"
        value {
          artifact_spec {
            type {
              name: "Model"
            }
          }
        }
      }
      outputs {
        key: "model_run"
        value {
          artifact_spec {
            type {
              name: "ModelRun"
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "custom_config"
        value {
          field_value {
            string_value: "null"
          }
        }
      }
      parameters {
        key: "eval_args"
        value {
          field_value {
            string_value: "{\n  \"num_steps\": 5\n}"
          }
        }
      }
      parameters {
        key: "module_path"
        value {
          field_value {
            string_value: "penguin_trainer@pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl"
          }
        }
      }
      parameters {
        key: "train_args"
        value {
          field_value {
            string_value: "{\n  \"num_steps\": 100\n}"
          }
        }
      }
    }
    upstream_nodes: "CsvExampleGen"
    upstream_nodes: "schema_importer"
    downstream_nodes: "Pusher"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.example_validator.component.ExampleValidator"
      }
      id: "ExampleValidator"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:55.331585"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv.ExampleValidator"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "schema"
        value {
          channels {
            producer_node_query {
              id: "schema_importer"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:55.331585"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv.schema_importer"
                }
              }
            }
            artifact_query {
              type {
                name: "Schema"
              }
            }
            output_key: "result"
          }
        }
      }
      inputs {
        key: "statistics"
        value {
          channels {
            producer_node_query {
              id: "StatisticsGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:55.331585"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv.StatisticsGen"
                }
              }
            }
            artifact_query {
              type {
                name: "ExampleStatistics"
              }
            }
            output_key: "statistics"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "anomalies"
        value {
          artifact_spec {
            type {
              name: "ExampleAnomalies"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "exclude_splits"
        value {
          field_value {
            string_value: "[]"
          }
        }
      }
    }
    upstream_nodes: "StatisticsGen"
    upstream_nodes: "schema_importer"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.pusher.component.Pusher"
      }
      id: "Pusher"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "penguin-tfdv"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "2021-06-02T09:12:55.331585"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "penguin-tfdv.Pusher"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "model"
        value {
          channels {
            producer_node_query {
              id: "Trainer"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "2021-06-02T09:12:55.331585"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "penguin-tfdv.Trainer"
                }
              }
            }
            artifact_query {
              type {
                name: "Model"
              }
            }
            output_key: "model"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "pushed_model"
        value {
          artifact_spec {
            type {
              name: "PushedModel"
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "custom_config"
        value {
          field_value {
            string_value: "null"
          }
        }
      }
      parameters {
        key: "push_destination"
        value {
          field_value {
            string_value: "{\n  \"filesystem\": {\n    \"base_directory\": \"serving_model/penguin-tfdv\"\n  }\n}"
          }
        }
      }
    }
    upstream_nodes: "Trainer"
    execution_options {
      caching_options {
      }
    }
  }
}
runtime_spec {
  pipeline_root {
    field_value {
      string_value: "pipelines/penguin-tfdv"
    }
  }
  pipeline_run_id {
    field_value {
      string_value: "2021-06-02T09:12:55.331585"
    }
  }
}
execution_mode: SYNC
deployment_config {
  type_url: "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig"
  value: "\n\220\001\n\rStatisticsGen\022\177\nHtype.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec\0223\n1\n/tfx.components.statistics_gen.executor.Executor\n\206\001\n\006Pusher\022|\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\022)\n\'tfx.components.pusher.executor.Executor\n\220\001\n\007Trainer\022\204\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\0221\n/tfx.components.trainer.executor.GenericExecutor\n\236\001\n\rCsvExampleGen\022\214\001\nHtype.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec\022@\n>\n<tfx.components.example_gen.csv_example_gen.executor.Executor\n\234\001\n\020ExampleValidator\022\207\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\0224\n2tfx.components.example_validator.executor.Executor\022\230\001\n\rCsvExampleGen\022\206\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\0223\n1tfx.components.example_gen.driver.FileBasedDriver*[\n0type.googleapis.com/ml_metadata.ConnectionConfig\022\'\032%\n!metadata/penguin-tfdv/metadata.db\020\003"
}

INFO:absl:Using deployment config:
 executor_specs {
  key: "CsvExampleGen"
  value {
    beam_executable_spec {
      python_executor_spec {
        class_path: "tfx.components.example_gen.csv_example_gen.executor.Executor"
      }
    }
  }
}
executor_specs {
  key: "ExampleValidator"
  value {
    python_class_executable_spec {
      class_path: "tfx.components.example_validator.executor.Executor"
    }
  }
}
executor_specs {
  key: "Pusher"
  value {
    python_class_executable_spec {
      class_path: "tfx.components.pusher.executor.Executor"
    }
  }
}
executor_specs {
  key: "StatisticsGen"
  value {
    beam_executable_spec {
      python_executor_spec {
        class_path: "tfx.components.statistics_gen.executor.Executor"
      }
    }
  }
}
executor_specs {
  key: "Trainer"
  value {
    python_class_executable_spec {
      class_path: "tfx.components.trainer.executor.GenericExecutor"
    }
  }
}
custom_driver_specs {
  key: "CsvExampleGen"
  value {
    python_class_executable_spec {
      class_path: "tfx.components.example_gen.driver.FileBasedDriver"
    }
  }
}
metadata_connection_config {
  sqlite {
    filename_uri: "metadata/penguin-tfdv/metadata.db"
    connection_mode: READWRITE_OPENCREATE
  }
}

INFO:absl:Using connection config:
 sqlite {
  filename_uri: "metadata/penguin-tfdv/metadata.db"
  connection_mode: READWRITE_OPENCREATE
}

INFO:absl:Component CsvExampleGen is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
  }
  id: "CsvExampleGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.CsvExampleGen"
      }
    }
  }
}
outputs {
  outputs {
    key: "examples"
    value {
      artifact_spec {
        type {
          name: "Examples"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
          properties {
            key: "version"
            value: INT
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "input_base"
    value {
      field_value {
        string_value: "/tmp/tfx-dataehlez1c6"
      }
    }
  }
  parameters {
    key: "input_config"
    value {
      field_value {
        string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"*\"\n    }\n  ]\n}"
      }
    }
  }
  parameters {
    key: "output_config"
    value {
      field_value {
        string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"
      }
    }
  }
  parameters {
    key: "output_data_format"
    value {
      field_value {
        int_value: 6
      }
    }
  }
}
downstream_nodes: "StatisticsGen"
downstream_nodes: "Trainer"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 1
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=1, input_dict={}, output_dict=defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "pipelines/penguin-tfdv/CsvExampleGen/examples/1"
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
, artifact_type: name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}), exec_properties={'input_base': '/tmp/tfx-dataehlez1c6', 'output_data_format': 6, 'output_config': '{\n  "split_config": {\n    "splits": [\n      {\n        "hash_buckets": 2,\n        "name": "train"\n      },\n      {\n        "hash_buckets": 1,\n        "name": "eval"\n      }\n    ]\n  }\n}', 'input_config': '{\n  "splits": [\n    {\n      "name": "single_split",\n      "pattern": "*"\n    }\n  ]\n}', 'span': 0, 'version': None, 'input_fingerprint': 'split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170'}, execution_output_uri='pipelines/penguin-tfdv/CsvExampleGen/.system/executor_execution/1/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv/CsvExampleGen/.system/stateful_working_dir/2021-06-02T09:12:55.331585', tmp_dir='pipelines/penguin-tfdv/CsvExampleGen/.system/executor_execution/1/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
  }
  id: "CsvExampleGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.CsvExampleGen"
      }
    }
  }
}
outputs {
  outputs {
    key: "examples"
    value {
      artifact_spec {
        type {
          name: "Examples"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
          properties {
            key: "version"
            value: INT
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "input_base"
    value {
      field_value {
        string_value: "/tmp/tfx-dataehlez1c6"
      }
    }
  }
  parameters {
    key: "input_config"
    value {
      field_value {
        string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"*\"\n    }\n  ]\n}"
      }
    }
  }
  parameters {
    key: "output_config"
    value {
      field_value {
        string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"
      }
    }
  }
  parameters {
    key: "output_data_format"
    value {
      field_value {
        int_value: 6
      }
    }
  }
}
downstream_nodes: "StatisticsGen"
downstream_nodes: "Trainer"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv"
, pipeline_run_id='2021-06-02T09:12:55.331585')
INFO:absl:Generating examples.
INFO:absl:Processing input csv data /tmp/tfx-dataehlez1c6/* to TFExample.
INFO:absl:Examples generated.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 1 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "pipelines/penguin-tfdv/CsvExampleGen/examples/1"
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}) for execution 1
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component CsvExampleGen is finished.
INFO:absl:Component schema_importer is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.dsl.components.common.importer.Importer"
  }
  id: "schema_importer"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.schema_importer"
      }
    }
  }
}
outputs {
  outputs {
    key: "result"
    value {
      artifact_spec {
        type {
          name: "Schema"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "artifact_uri"
    value {
      field_value {
        string_value: "schema"
      }
    }
  }
  parameters {
    key: "reimport"
    value {
      field_value {
        int_value: 0
      }
    }
  }
}
downstream_nodes: "ExampleValidator"
downstream_nodes: "Trainer"
execution_options {
  caching_options {
  }
}

INFO:absl:Running as an importer node.
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Processing source uri: schema, properties: {}, custom_properties: {}
INFO:absl:Component schema_importer is finished.
INFO:absl:Component StatisticsGen is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.statistics_gen.component.StatisticsGen"
  }
  id: "StatisticsGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.StatisticsGen"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "CsvExampleGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.CsvExampleGen"
            }
          }
        }
        artifact_query {
          type {
            name: "Examples"
          }
        }
        output_key: "examples"
      }
    }
  }
}
outputs {
  outputs {
    key: "statistics"
    value {
      artifact_spec {
        type {
          name: "ExampleStatistics"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
}
upstream_nodes: "CsvExampleGen"
downstream_nodes: "ExampleValidator"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 3
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=3, input_dict={'examples': [Artifact(artifact: id: 1
type_id: 6
uri: "pipelines/penguin-tfdv/CsvExampleGen/examples/1"
properties {
  key: "split_names"
  value {
    string_value: "[\"train\", \"eval\"]"
  }
}
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "payload_format"
  value {
    string_value: "FORMAT_TF_EXAMPLE"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
state: LIVE
create_time_since_epoch: 1622625176396
last_update_time_since_epoch: 1622625176396
, artifact_type: id: 6
name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}, output_dict=defaultdict(<class 'list'>, {'statistics': [Artifact(artifact: uri: "pipelines/penguin-tfdv/StatisticsGen/statistics/3"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:StatisticsGen:statistics:0"
  }
}
, artifact_type: name: "ExampleStatistics"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}), exec_properties={'exclude_splits': '[]'}, execution_output_uri='pipelines/penguin-tfdv/StatisticsGen/.system/executor_execution/3/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv/StatisticsGen/.system/stateful_working_dir/2021-06-02T09:12:55.331585', tmp_dir='pipelines/penguin-tfdv/StatisticsGen/.system/executor_execution/3/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.statistics_gen.component.StatisticsGen"
  }
  id: "StatisticsGen"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.StatisticsGen"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "CsvExampleGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.CsvExampleGen"
            }
          }
        }
        artifact_query {
          type {
            name: "Examples"
          }
        }
        output_key: "examples"
      }
    }
  }
}
outputs {
  outputs {
    key: "statistics"
    value {
      artifact_spec {
        type {
          name: "ExampleStatistics"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
}
upstream_nodes: "CsvExampleGen"
downstream_nodes: "ExampleValidator"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv"
, pipeline_run_id='2021-06-02T09:12:55.331585')
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to pipelines/penguin-tfdv/StatisticsGen/statistics/3/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to pipelines/penguin-tfdv/StatisticsGen/statistics/3/Split-eval.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 3 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'statistics': [Artifact(artifact: uri: "pipelines/penguin-tfdv/StatisticsGen/statistics/3"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:StatisticsGen:statistics:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "ExampleStatistics"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}) for execution 3
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component StatisticsGen is finished.
INFO:absl:Component Trainer is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.trainer.component.Trainer"
  }
  id: "Trainer"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.Trainer"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "CsvExampleGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.CsvExampleGen"
            }
          }
        }
        artifact_query {
          type {
            name: "Examples"
          }
        }
        output_key: "examples"
      }
    }
  }
  inputs {
    key: "schema"
    value {
      channels {
        producer_node_query {
          id: "schema_importer"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.schema_importer"
            }
          }
        }
        artifact_query {
          type {
            name: "Schema"
          }
        }
        output_key: "result"
      }
    }
  }
}
outputs {
  outputs {
    key: "model"
    value {
      artifact_spec {
        type {
          name: "Model"
        }
      }
    }
  }
  outputs {
    key: "model_run"
    value {
      artifact_spec {
        type {
          name: "ModelRun"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "custom_config"
    value {
      field_value {
        string_value: "null"
      }
    }
  }
  parameters {
    key: "eval_args"
    value {
      field_value {
        string_value: "{\n  \"num_steps\": 5\n}"
      }
    }
  }
  parameters {
    key: "module_path"
    value {
      field_value {
        string_value: "penguin_trainer@pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl"
      }
    }
  }
  parameters {
    key: "train_args"
    value {
      field_value {
        string_value: "{\n  \"num_steps\": 100\n}"
      }
    }
  }
}
upstream_nodes: "CsvExampleGen"
upstream_nodes: "schema_importer"
downstream_nodes: "Pusher"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 4
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=4, input_dict={'schema': [Artifact(artifact: id: 2
type_id: 8
uri: "schema"
state: LIVE
create_time_since_epoch: 1622625176422
last_update_time_since_epoch: 1622625176422
, artifact_type: id: 8
name: "Schema"
)], 'examples': [Artifact(artifact: id: 1
type_id: 6
uri: "pipelines/penguin-tfdv/CsvExampleGen/examples/1"
properties {
  key: "split_names"
  value {
    string_value: "[\"train\", \"eval\"]"
  }
}
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1622625170,sum_checksum:1622625170"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "payload_format"
  value {
    string_value: "FORMAT_TF_EXAMPLE"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
state: LIVE
create_time_since_epoch: 1622625176396
last_update_time_since_epoch: 1622625176396
, artifact_type: id: 6
name: "Examples"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
properties {
  key: "version"
  value: INT
}
)]}, output_dict=defaultdict(<class 'list'>, {'model': [Artifact(artifact: uri: "pipelines/penguin-tfdv/Trainer/model/4"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Trainer:model:0"
  }
}
, artifact_type: name: "Model"
)], 'model_run': [Artifact(artifact: uri: "pipelines/penguin-tfdv/Trainer/model_run/4"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Trainer:model_run:0"
  }
}
, artifact_type: name: "ModelRun"
)]}), exec_properties={'module_path': 'penguin_trainer@pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl', 'train_args': '{\n  "num_steps": 100\n}', 'eval_args': '{\n  "num_steps": 5\n}', 'custom_config': 'null'}, execution_output_uri='pipelines/penguin-tfdv/Trainer/.system/executor_execution/4/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv/Trainer/.system/stateful_working_dir/2021-06-02T09:12:55.331585', tmp_dir='pipelines/penguin-tfdv/Trainer/.system/executor_execution/4/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.trainer.component.Trainer"
  }
  id: "Trainer"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.Trainer"
      }
    }
  }
}
inputs {
  inputs {
    key: "examples"
    value {
      channels {
        producer_node_query {
          id: "CsvExampleGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.CsvExampleGen"
            }
          }
        }
        artifact_query {
          type {
            name: "Examples"
          }
        }
        output_key: "examples"
      }
    }
  }
  inputs {
    key: "schema"
    value {
      channels {
        producer_node_query {
          id: "schema_importer"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.schema_importer"
            }
          }
        }
        artifact_query {
          type {
            name: "Schema"
          }
        }
        output_key: "result"
      }
    }
  }
}
outputs {
  outputs {
    key: "model"
    value {
      artifact_spec {
        type {
          name: "Model"
        }
      }
    }
  }
  outputs {
    key: "model_run"
    value {
      artifact_spec {
        type {
          name: "ModelRun"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "custom_config"
    value {
      field_value {
        string_value: "null"
      }
    }
  }
  parameters {
    key: "eval_args"
    value {
      field_value {
        string_value: "{\n  \"num_steps\": 5\n}"
      }
    }
  }
  parameters {
    key: "module_path"
    value {
      field_value {
        string_value: "penguin_trainer@pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl"
      }
    }
  }
  parameters {
    key: "train_args"
    value {
      field_value {
        string_value: "{\n  \"num_steps\": 100\n}"
      }
    }
  }
}
upstream_nodes: "CsvExampleGen"
upstream_nodes: "schema_importer"
downstream_nodes: "Pusher"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv"
, pipeline_run_id='2021-06-02T09:12:55.331585')
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/tmp/tmpb83ytk8y.json', '--HistoryManager.hist_file=:memory:']
INFO:absl:Attempting to infer TFX Python dependency for beam
INFO:absl:Copying all content from install dir /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx to temp dir /tmp/tmpca8we0v7/build/tfx
INFO:absl:Generating a temp setup file at /tmp/tmpca8we0v7/build/tfx/setup.py
INFO:absl:Creating temporary sdist package, logs available at /tmp/tmpca8we0v7/build/tfx/setup.log
INFO:absl:Added --extra_package=/tmp/tmpca8we0v7/build/tfx/dist/tfx_ephemeral-0.30.0.tar.gz to beam args
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
ERROR:absl:udf_utils.get_fn {'module_path': 'penguin_trainer@pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl', 'train_args': '{\n  "num_steps": 100\n}', 'eval_args': '{\n  "num_steps": 5\n}', 'custom_config': 'null'} 'run_fn'
INFO:absl:Installing 'pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp0vby84aq', 'pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl']
INFO:absl:Successfully installed 'pipelines/penguin-tfdv/_wheels/tfx_user_code_Trainer-0.0+000876a22093ec764e3751d5a3ed939f1b107d1d6ade133f954ea2a767b8dfb2-py3-none-any.whl'.
INFO:absl:Training model.
INFO:absl:Feature body_mass_g has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_depth_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature flipper_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature species has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature body_mass_g has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_depth_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature flipper_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature species has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature body_mass_g has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_depth_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature flipper_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature species has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature body_mass_g has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_depth_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature culmen_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature flipper_length_mm has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Feature species has a shape dim {
  size: 1
}
. Setting to DenseTensor.
INFO:absl:Model: "model"
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Layer (type)                    Output Shape         Param #     Connected to                     
INFO:absl:==================================================================================================
INFO:absl:body_mass_g (InputLayer)        [(None, 1)]          0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:culmen_depth_mm (InputLayer)    [(None, 1)]          0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:culmen_length_mm (InputLayer)   [(None, 1)]          0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:flipper_length_mm (InputLayer)  [(None, 1)]          0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:concatenate (Concatenate)       (None, 4)            0           body_mass_g[0][0]                
INFO:absl:                                                                 culmen_depth_mm[0][0]            
INFO:absl:                                                                 culmen_length_mm[0][0]           
INFO:absl:                                                                 flipper_length_mm[0][0]          
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense (Dense)                   (None, 8)            40          concatenate[0][0]                
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_1 (Dense)                 (None, 8)            72          dense[0][0]                      
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_2 (Dense)                 (None, 3)            27          dense_1[0][0]                    
INFO:absl:==================================================================================================
INFO:absl:Total params: 139
INFO:absl:Trainable params: 139
INFO:absl:Non-trainable params: 0
INFO:absl:__________________________________________________________________________________________________
100/100 [==============================] - 1s 6ms/step - loss: 0.8024 - sparse_categorical_accuracy: 0.6896 - val_loss: 0.3546 - val_sparse_categorical_accuracy: 0.8400
INFO:tensorflow:Assets written to: pipelines/penguin-tfdv/Trainer/model/4/Format-Serving/assets
INFO:tensorflow:Assets written to: pipelines/penguin-tfdv/Trainer/model/4/Format-Serving/assets
INFO:absl:Training complete. Model written to pipelines/penguin-tfdv/Trainer/model/4/Format-Serving. ModelRun written to pipelines/penguin-tfdv/Trainer/model_run/4
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 4 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'model': [Artifact(artifact: uri: "pipelines/penguin-tfdv/Trainer/model/4"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Trainer:model:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "Model"
)], 'model_run': [Artifact(artifact: uri: "pipelines/penguin-tfdv/Trainer/model_run/4"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Trainer:model_run:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "ModelRun"
)]}) for execution 4
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Trainer is finished.
INFO:absl:Component ExampleValidator is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.example_validator.component.ExampleValidator"
  }
  id: "ExampleValidator"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.ExampleValidator"
      }
    }
  }
}
inputs {
  inputs {
    key: "schema"
    value {
      channels {
        producer_node_query {
          id: "schema_importer"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.schema_importer"
            }
          }
        }
        artifact_query {
          type {
            name: "Schema"
          }
        }
        output_key: "result"
      }
    }
  }
  inputs {
    key: "statistics"
    value {
      channels {
        producer_node_query {
          id: "StatisticsGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.StatisticsGen"
            }
          }
        }
        artifact_query {
          type {
            name: "ExampleStatistics"
          }
        }
        output_key: "statistics"
      }
    }
  }
}
outputs {
  outputs {
    key: "anomalies"
    value {
      artifact_spec {
        type {
          name: "ExampleAnomalies"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
}
upstream_nodes: "StatisticsGen"
upstream_nodes: "schema_importer"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 5
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=5, input_dict={'schema': [Artifact(artifact: id: 2
type_id: 8
uri: "schema"
state: LIVE
create_time_since_epoch: 1622625176422
last_update_time_since_epoch: 1622625176422
, artifact_type: id: 8
name: "Schema"
)], 'statistics': [Artifact(artifact: id: 3
type_id: 10
uri: "pipelines/penguin-tfdv/StatisticsGen/statistics/3"
properties {
  key: "split_names"
  value {
    string_value: "[\"train\", \"eval\"]"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:StatisticsGen:statistics:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
state: LIVE
create_time_since_epoch: 1622625179371
last_update_time_since_epoch: 1622625179371
, artifact_type: id: 10
name: "ExampleStatistics"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}, output_dict=defaultdict(<class 'list'>, {'anomalies': [Artifact(artifact: uri: "pipelines/penguin-tfdv/ExampleValidator/anomalies/5"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:ExampleValidator:anomalies:0"
  }
}
, artifact_type: name: "ExampleAnomalies"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}), exec_properties={'exclude_splits': '[]'}, execution_output_uri='pipelines/penguin-tfdv/ExampleValidator/.system/executor_execution/5/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv/ExampleValidator/.system/stateful_working_dir/2021-06-02T09:12:55.331585', tmp_dir='pipelines/penguin-tfdv/ExampleValidator/.system/executor_execution/5/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.example_validator.component.ExampleValidator"
  }
  id: "ExampleValidator"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.ExampleValidator"
      }
    }
  }
}
inputs {
  inputs {
    key: "schema"
    value {
      channels {
        producer_node_query {
          id: "schema_importer"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.schema_importer"
            }
          }
        }
        artifact_query {
          type {
            name: "Schema"
          }
        }
        output_key: "result"
      }
    }
  }
  inputs {
    key: "statistics"
    value {
      channels {
        producer_node_query {
          id: "StatisticsGen"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.StatisticsGen"
            }
          }
        }
        artifact_query {
          type {
            name: "ExampleStatistics"
          }
        }
        output_key: "statistics"
      }
    }
  }
}
outputs {
  outputs {
    key: "anomalies"
    value {
      artifact_spec {
        type {
          name: "ExampleAnomalies"
          properties {
            key: "span"
            value: INT
          }
          properties {
            key: "split_names"
            value: STRING
          }
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "exclude_splits"
    value {
      field_value {
        string_value: "[]"
      }
    }
  }
}
upstream_nodes: "StatisticsGen"
upstream_nodes: "schema_importer"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv"
, pipeline_run_id='2021-06-02T09:12:55.331585')
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/tmp/tmpb83ytk8y.json', '--HistoryManager.hist_file=:memory:']
INFO:absl:Attempting to infer TFX Python dependency for beam
INFO:absl:Copying all content from install dir /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx to temp dir /tmp/tmpvyg3vbr7/build/tfx
INFO:absl:Generating a temp setup file at /tmp/tmpvyg3vbr7/build/tfx/setup.py
INFO:absl:Creating temporary sdist package, logs available at /tmp/tmpvyg3vbr7/build/tfx/setup.log
INFO:absl:Added --extra_package=/tmp/tmpvyg3vbr7/build/tfx/dist/tfx_ephemeral-0.30.0.tar.gz to beam args
INFO:absl:Validating schema against the computed statistics for split train.
INFO:absl:Validation complete for split train. Anomalies written to pipelines/penguin-tfdv/ExampleValidator/anomalies/5/Split-train.
INFO:absl:Validating schema against the computed statistics for split eval.
INFO:absl:Validation complete for split eval. Anomalies written to pipelines/penguin-tfdv/ExampleValidator/anomalies/5/Split-eval.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 5 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'anomalies': [Artifact(artifact: uri: "pipelines/penguin-tfdv/ExampleValidator/anomalies/5"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:ExampleValidator:anomalies:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "ExampleAnomalies"
properties {
  key: "span"
  value: INT
}
properties {
  key: "split_names"
  value: STRING
}
)]}) for execution 5
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component ExampleValidator is finished.
INFO:absl:Component Pusher is running.
INFO:absl:Running launcher for node_info {
  type {
    name: "tfx.components.pusher.component.Pusher"
  }
  id: "Pusher"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.Pusher"
      }
    }
  }
}
inputs {
  inputs {
    key: "model"
    value {
      channels {
        producer_node_query {
          id: "Trainer"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.Trainer"
            }
          }
        }
        artifact_query {
          type {
            name: "Model"
          }
        }
        output_key: "model"
      }
    }
  }
}
outputs {
  outputs {
    key: "pushed_model"
    value {
      artifact_spec {
        type {
          name: "PushedModel"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "custom_config"
    value {
      field_value {
        string_value: "null"
      }
    }
  }
  parameters {
    key: "push_destination"
    value {
      field_value {
        string_value: "{\n  \"filesystem\": {\n    \"base_directory\": \"serving_model/penguin-tfdv\"\n  }\n}"
      }
    }
  }
}
upstream_nodes: "Trainer"
execution_options {
  caching_options {
  }
}

INFO:absl:MetadataStore with DB connection initialized
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Going to run a new execution 6
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=6, input_dict={'model': [Artifact(artifact: id: 4
type_id: 12
uri: "pipelines/penguin-tfdv/Trainer/model/4"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Trainer:model:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
state: LIVE
create_time_since_epoch: 1622625185204
last_update_time_since_epoch: 1622625185204
, artifact_type: id: 12
name: "Model"
)]}, output_dict=defaultdict(<class 'list'>, {'pushed_model': [Artifact(artifact: uri: "pipelines/penguin-tfdv/Pusher/pushed_model/6"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Pusher:pushed_model:0"
  }
}
, artifact_type: name: "PushedModel"
)]}), exec_properties={'push_destination': '{\n  "filesystem": {\n    "base_directory": "serving_model/penguin-tfdv"\n  }\n}', 'custom_config': 'null'}, execution_output_uri='pipelines/penguin-tfdv/Pusher/.system/executor_execution/6/executor_output.pb', stateful_working_dir='pipelines/penguin-tfdv/Pusher/.system/stateful_working_dir/2021-06-02T09:12:55.331585', tmp_dir='pipelines/penguin-tfdv/Pusher/.system/executor_execution/6/.temp/', pipeline_node=node_info {
  type {
    name: "tfx.components.pusher.component.Pusher"
  }
  id: "Pusher"
}
contexts {
  contexts {
    type {
      name: "pipeline"
    }
    name {
      field_value {
        string_value: "penguin-tfdv"
      }
    }
  }
  contexts {
    type {
      name: "pipeline_run"
    }
    name {
      field_value {
        string_value: "2021-06-02T09:12:55.331585"
      }
    }
  }
  contexts {
    type {
      name: "node"
    }
    name {
      field_value {
        string_value: "penguin-tfdv.Pusher"
      }
    }
  }
}
inputs {
  inputs {
    key: "model"
    value {
      channels {
        producer_node_query {
          id: "Trainer"
        }
        context_queries {
          type {
            name: "pipeline"
          }
          name {
            field_value {
              string_value: "penguin-tfdv"
            }
          }
        }
        context_queries {
          type {
            name: "pipeline_run"
          }
          name {
            field_value {
              string_value: "2021-06-02T09:12:55.331585"
            }
          }
        }
        context_queries {
          type {
            name: "node"
          }
          name {
            field_value {
              string_value: "penguin-tfdv.Trainer"
            }
          }
        }
        artifact_query {
          type {
            name: "Model"
          }
        }
        output_key: "model"
      }
    }
  }
}
outputs {
  outputs {
    key: "pushed_model"
    value {
      artifact_spec {
        type {
          name: "PushedModel"
        }
      }
    }
  }
}
parameters {
  parameters {
    key: "custom_config"
    value {
      field_value {
        string_value: "null"
      }
    }
  }
  parameters {
    key: "push_destination"
    value {
      field_value {
        string_value: "{\n  \"filesystem\": {\n    \"base_directory\": \"serving_model/penguin-tfdv\"\n  }\n}"
      }
    }
  }
}
upstream_nodes: "Trainer"
execution_options {
  caching_options {
  }
}
, pipeline_info=id: "penguin-tfdv"
, pipeline_run_id='2021-06-02T09:12:55.331585')
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/tmp/tmpb83ytk8y.json', '--HistoryManager.hist_file=:memory:']
INFO:absl:Attempting to infer TFX Python dependency for beam
INFO:absl:Copying all content from install dir /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx to temp dir /tmp/tmpayhi1ukl/build/tfx
INFO:absl:Generating a temp setup file at /tmp/tmpayhi1ukl/build/tfx/setup.py
INFO:absl:Creating temporary sdist package, logs available at /tmp/tmpayhi1ukl/build/tfx/setup.log
INFO:absl:Added --extra_package=/tmp/tmpayhi1ukl/build/tfx/dist/tfx_ephemeral-0.30.0.tar.gz to beam args
WARNING:absl:Pusher is going to push the model without validation. Consider using Evaluator or InfraValidator in your pipeline.
INFO:absl:Model version: 1622625187
INFO:absl:Model written to serving path serving_model/penguin-tfdv/1622625187.
INFO:absl:Model pushed to pipelines/penguin-tfdv/Pusher/pushed_model/6.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 6 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'pushed_model': [Artifact(artifact: uri: "pipelines/penguin-tfdv/Pusher/pushed_model/6"
custom_properties {
  key: "name"
  value {
    string_value: "penguin-tfdv:2021-06-02T09:12:55.331585:Pusher:pushed_model:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "0.30.0"
  }
}
, artifact_type: name: "PushedModel"
)]}) for execution 6
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Pusher is finished.

Sie sollten "INFO:absl:Component Pusher is done." sehen. wenn die Pipeline erfolgreich beendet wurde.

Untersuchen Sie die Ausgaben der Pipeline

Wir haben das Klassifizierungsmodell für Pinguine trainiert und auch die Eingabebeispiele in der ExampleValidator-Komponente validiert. Wir können die Ausgabe von ExampleValidator wie bei der vorherigen Pipeline analysieren.

metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(
    METADATA_PATH)

with Metadata(metadata_connection_config) as metadata_handler:
  ev_output = get_latest_artifacts(metadata_handler, PIPELINE_NAME,
                                   'ExampleValidator')
  anomalies_artifacts = ev_output[standard_component_specs.ANOMALIES_KEY]
INFO:absl:MetadataStore with DB connection initialized

Beispielanomalien aus dem ExampleValidator können ebenfalls visualisiert werden.

visualize_artifacts(anomalies_artifacts)
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/display_util.py:217: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('max_colwidth', -1)

Für jede Aufteilung der Beispiele sollte "Keine Anomalien gefunden" angezeigt werden. Da wir in dieser Pipeline dieselben Daten verwendet haben, die für die Schemagenerierung verwendet wurden, wird hier keine Anomalie erwartet. Wenn Sie diese Pipeline wiederholt mit neuen eingehenden Daten ausführen, sollte ExampleValidator in der Lage sein, alle Diskrepanzen zwischen den neuen Daten und dem vorhandenen Schema zu finden.

Wenn Anomalien gefunden wurden, können Sie Ihre Daten überprüfen, um festzustellen, ob Beispiele Ihren Annahmen nicht entsprechen. Ausgaben von anderen Komponenten wie StatisticsGen können nützlich sein. Alle gefundenen Anomalien blockieren jedoch NICHT weitere Pipeline-Ausführungen.

Nächste Schritte

Weitere Ressourcen finden Sie unter https://www.tensorflow.org/tfx/tutorials

Bitte lesen Sie TFX-Pipelines verstehen , um mehr über verschiedene Konzepte in TFX zu erfahren.