Treten Sie der SIG TFX-Addons-Community bei und helfen Sie mit, TFX noch besser zu machen! SIG TFX-Addons beitreten

TFX Keras-Komponenten-Tutorial

Eine komponentenweise Einführung in TensorFlow Extended (TFX)

Dieses Colab-basierte Tutorial führt interaktiv durch jede integrierte Komponente von TensorFlow Extended (TFX).

Es deckt jeden Schritt in einer End-to-End-Pipeline für maschinelles Lernen ab, von der Datenaufnahme über das Pushen eines Modells bis zur Bereitstellung.

Wenn Sie fertig sind, kann der Inhalt dieses Notebooks automatisch als TFX-Pipeline-Quellcode exportiert werden, den Sie mit Apache Airflow und Apache Beam orchestrieren können.

Hintergrund

Dieses Notebook demonstriert die Verwendung von TFX in einer Jupyter/Colab-Umgebung. Hier gehen wir das Chicago Taxi-Beispiel in einem interaktiven Notizbuch durch.

Die Arbeit in einem interaktiven Notizbuch ist eine nützliche Methode, um sich mit der Struktur einer TFX-Pipeline vertraut zu machen. Es ist auch nützlich, wenn Sie Ihre eigenen Pipelines als einfache Entwicklungsumgebung entwickeln, aber Sie sollten sich bewusst sein, dass es Unterschiede in der Art und Weise gibt, wie interaktive Notebooks orchestriert werden und wie sie auf Metadatenartefakte zugreifen.

Orchestrierung

In einer Produktionsbereitstellung von TFX verwenden Sie einen Orchestrator wie Apache Airflow, Kubeflow Pipelines oder Apache Beam, um ein vordefiniertes Pipeline-Diagramm von TFX-Komponenten zu orchestrieren. In einem interaktiven Notebook ist das Notebook selbst der Orchestrator, der jede TFX-Komponente ausführt, während Sie die Notebook-Zellen ausführen.

Metadaten

In einer Produktionsbereitstellung von TFX greifen Sie über die ML Metadata (MLMD) API auf Metadaten zu. MLMD speichert Metadateneigenschaften in einer Datenbank wie MySQL oder SQLite und speichert die Metadaten-Nutzlasten in einem dauerhaften Speicher wie in Ihrem Dateisystem. In einem interaktiven Notebook, beide Eigenschaften und Nutzlasten in einer ephemeren SQLite - Datenbank in dem gespeicherten /tmp - Verzeichnis auf dem Jupyter Notebook oder Colab Server.

Installieren

Zuerst installieren und importieren wir die erforderlichen Pakete, richten Pfade ein und laden Daten herunter.

Upgrade-Pip

Um zu vermeiden, dass Pip in einem System aktualisiert wird, wenn es lokal ausgeführt wird, stellen Sie sicher, dass wir in Colab ausgeführt werden. Lokale Systeme können natürlich separat nachgerüstet werden.

try:
  import colab
  !pip install --upgrade pip
except:
  pass

TFX installieren

pip install -U tfx

Hast du die Laufzeit neu gestartet?

Wenn Sie Google Colab verwenden, müssen Sie beim ersten Ausführen der obigen Zelle die Laufzeit neu starten (Laufzeit > Laufzeit neu starten ...). Dies liegt an der Art und Weise, wie Colab Pakete lädt.

Pakete importieren

Wir importieren notwendige Pakete, einschließlich Standard-TFX-Komponentenklassen.

import os
import pprint
import tempfile
import urllib

import absl
import tensorflow as tf
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
pp = pprint.PrettyPrinter()

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip
2021-07-27 09:07:32.686219: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

Lassen Sie uns die Bibliotheksversionen überprüfen.

print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))
TensorFlow version: 2.5.0
TFX version: 1.0.0

Pipelinepfade einrichten

# This is the root directory for your TFX pip package installation.
_tfx_root = tfx.__path__[0]

# This is the directory containing the TFX Chicago Taxi Pipeline example.
_taxi_root = os.path.join(_tfx_root, 'examples/chicago_taxi_pipeline')

# This is the path where your model will be pushed for serving.
_serving_model_dir = os.path.join(
    tempfile.mkdtemp(), 'serving_model/taxi_simple')

# Set up logging.
absl.logging.set_verbosity(absl.logging.INFO)

Beispieldaten herunterladen

Wir laden den Beispieldatensatz zur Verwendung in unserer TFX-Pipeline herunter.

Der Datensatz verwenden wir das Taxi - Datensatz Ausflüge von der Stadt Chicago veröffentlicht. Die Spalten in diesem Datensatz sind:

pickup_community_area Fahrpreis trip_start_month
trip_start_hour trip_start_day trip_start_timestamp
Pickup_Latitude Pickup_Längengrad dropoff_latitude
dropoff_longitude trip_miles pickup_census_tract
dropoff_census_tract Zahlungsart Unternehmen
trip_seconds dropoff_community_area Tipps

Mit diesem Datensatz werden wir ein Modell erstellen, das die vorhersagt tips einer Reise.

_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv'
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)
('/tmp/tfx-datazpmc_2k0/data.csv', <http.client.HTTPMessage at 0x7f29e4316290>)

Sehen Sie sich die CSV-Datei kurz an.

head {_data_filepath}
pickup_community_area,fare,trip_start_month,trip_start_hour,trip_start_day,trip_start_timestamp,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_miles,pickup_census_tract,dropoff_census_tract,payment_type,company,trip_seconds,dropoff_community_area,tips
,12.45,5,19,6,1400269500,,,,,0.0,,,Credit Card,Chicago Elite Cab Corp. (Chicago Carriag,0,,0.0
,0,3,19,5,1362683700,,,,,0,,,Unknown,Chicago Elite Cab Corp.,300,,0
60,27.05,10,2,3,1380593700,41.836150155,-87.648787952,,,12.6,,,Cash,Taxi Affiliation Services,1380,,0.0
10,5.85,10,1,2,1382319000,41.985015101,-87.804532006,,,0.0,,,Cash,Taxi Affiliation Services,180,,0.0
14,16.65,5,7,5,1369897200,41.968069,-87.721559063,,,0.0,,,Cash,Dispatch Taxi Affiliation,1080,,0.0
13,16.45,11,12,3,1446554700,41.983636307,-87.723583185,,,6.9,,,Cash,,780,,0.0
16,32.05,12,1,1,1417916700,41.953582125,-87.72345239,,,15.4,,,Cash,,1200,,0.0
30,38.45,10,10,5,1444301100,41.839086906,-87.714003807,,,14.6,,,Cash,,2580,,0.0
11,14.65,1,1,3,1358213400,41.978829526,-87.771166703,,,5.81,,,Cash,,1080,,0.0

Haftungsausschluss: Diese Website bietet Anwendungen, die Daten verwenden, die für die Verwendung von ihrer ursprünglichen Quelle, www.cityofchicago.org, der offiziellen Website der Stadt Chicago, modifiziert wurden. Die City of Chicago erhebt keinen Anspruch auf Inhalt, Richtigkeit, Aktualität oder Vollständigkeit der auf dieser Site bereitgestellten Daten. Die auf dieser Site bereitgestellten Daten können sich jederzeit ändern. Es versteht sich, dass die Nutzung der auf dieser Site bereitgestellten Daten auf eigene Gefahr erfolgt.

Erstellen Sie den InteractiveContext

Zuletzt erstellen wir einen InteractiveContext, der es uns ermöglicht, TFX-Komponenten interaktiv in diesem Notebook auszuführen.

# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
context = InteractiveContext()
WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/metadata.sqlite.

TFX-Komponenten interaktiv ausführen

In den folgenden Zellen erstellen wir nacheinander TFX-Komponenten, führen jede von ihnen aus und visualisieren ihre Ausgabeartefakte.

BeispielGen

Die ExampleGen Komponente ist in der Regel zu Beginn einer TFX - Pipeline. Es wird:

  1. Daten in Trainings- und Bewertungssätze aufteilen (standardmäßig 2/3 Training + 1/3 Bewertung)
  2. Konvertieren von Daten in das tf.Example Format (weitere Informationen hier )
  3. Kopieren von Daten in das _tfx_root Verzeichnis für andere Komponenten für den Zugriff

ExampleGen nimmt als Eingabe den Pfad zur Datenquelle. In unserem Fall ist dies der _data_root Pfad, der die heruntergeladene CSV enthält.

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
context.run(example_gen)
INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen
INFO:absl:Generating examples.
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
INFO:absl:Processing input csv data /tmp/tfx-datazpmc_2k0/* to TFExample.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized

Betrachten sie den Ausgang Artefakte von ExampleGen . Diese Komponente erzeugt zwei Artefakte, Trainingsbeispiele und Evaluierungsbeispiele:

artifact = example_gen.outputs['examples'].get()[0]
print(artifact.split_names, artifact.uri)
["train", "eval"] /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/CsvExampleGen/examples/1

Wir können uns auch die ersten drei Trainingsbeispiele anschauen:

# Get the URI of the output artifact representing the training examples, which is a directory
train_uri = os.path.join(example_gen.outputs['examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)
features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Chicago Elite Cab Corp. (Chicago Carriag"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 12.449999809265137
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Credit Card"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1400269500
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Taxi Affiliation Services"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 27.049999237060547
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.836151123046875
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.64878845214844
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 12.600000381469727
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 1380
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1380593700
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 16.450000762939453
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.98363494873047
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.72357940673828
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 6.900000095367432
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 780
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1446554700
      }
    }
  }
}
2021-07-27 09:07:45.320714: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-27 09:07:46.182366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.183343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-27 09:07:46.183380: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-27 09:07:46.186392: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-27 09:07:46.186495: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-27 09:07:46.187463: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-27 09:07:46.187792: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-27 09:07:46.188571: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-27 09:07:46.189318: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-27 09:07:46.189509: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-27 09:07:46.189639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.190658: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.191561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-27 09:07:46.192211: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-27 09:07:46.192702: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.193636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-27 09:07:46.193733: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.194748: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.195660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-27 09:07:46.195706: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-27 09:07:46.821426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-27 09:07:46.821464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-27 09:07:46.821473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-27 09:07:46.821754: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.822812: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.823824: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-27 09:07:46.824737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
2021-07-27 09:07:46.854033: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-07-27 09:07:46.854452: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000185000 Hz

Nun , da ExampleGen die Daten Einnahme beendet hat, ist der nächste Schritt der Datenanalyse.

StatistikGen

Die StatisticsGen Komponente berechnet Statistiken über Ihre Datenmenge für die Datenanalyse sowie für die Verwendung in nachgelagerten Komponenten. Es nutzt die TensorFlow Data Validation - Bibliothek.

StatisticsGen nimmt als Eingabe den Datensatz wir mit nur eingenommen ExampleGen .

statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
context.run(statistics_gen)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/StatisticsGen/statistics/2/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/StatisticsGen/statistics/2/Split-eval.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:absl:Running publisher for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized

Nach StatisticsGen Lauf abgeschlossen ist , können wir die ausgegebenen Statistiken visualisieren. Versuchen Sie, mit den verschiedenen Plots zu spielen!

context.show(statistics_gen.outputs['statistics'])

SchemaGen

Die SchemaGen Komponente erzeugt ein Schema auf der Grundlage Ihrer Daten Statistiken. (A - Schema definiert die erwarteten Grenzen, Typen und Eigenschaften der Funktionen in Ihrem Daten - Set) . Es nutzt auch die TensorFlow Data Validation - Bibliothek.

SchemaGen werden die Statistiken als Eingabe , dass wir mit generierten StatisticsGen , bei der Ausbildung Split standardmäßig suchen.

schema_gen = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(schema_gen)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
2021-07-27 09:07:50.046440: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Running executor for SchemaGen
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/SchemaGen/schema/3/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized

Nach SchemaGen Ausführung beendet ist , können wir das generierte Schema als Tabelle visualisieren.

context.show(schema_gen.outputs['schema'])
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/display_util.py:180: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('max_colwidth', -1)

Jedes Feature in Ihrem Dataset wird neben seinen Eigenschaften als Zeile in der Schematabelle angezeigt. Das Schema erfasst auch alle Werte, die ein kategoriales Merkmal annimmt, das als seine Domäne bezeichnet wird.

Weitere Informationen zu Schemas finden Sie die SchemaGen Dokumentation .

BeispielValidator

Die ExampleValidator Komponente erkennt Anomalien in den Daten auf der Grundlage der Erwartungen durch das Schema definiert. Es nutzt auch die TensorFlow Data Validation - Bibliothek.

ExampleValidator wird als Eingabe die Statistiken von StatisticsGen , und das Schema von SchemaGen .

example_validator = tfx.components.ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_gen.outputs['schema'])
context.run(example_validator)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for ExampleValidator
INFO:absl:Validating schema against the computed statistics for split train.
INFO:absl:Validation complete for split train. Anomalies written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/ExampleValidator/anomalies/4/Split-train.
INFO:absl:Validating schema against the computed statistics for split eval.
INFO:absl:Validation complete for split eval. Anomalies written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/ExampleValidator/anomalies/4/Split-eval.
INFO:absl:Running publisher for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized

Nach ExampleValidator Ausführung beendet ist , können wir die Anomalien als Tabelle visualisieren.

context.show(example_validator.outputs['anomalies'])
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/display_util.py:217: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('max_colwidth', -1)

In der Anomalientabelle können wir sehen, dass es keine Anomalien gibt. Dies ist, was wir erwarten würden, da dies der erste Datensatz ist, den wir analysiert haben und das Schema darauf zugeschnitten ist. Sie sollten dieses Schema überprüfen – alles Unerwartete bedeutet eine Anomalie in den Daten. Nach der Überprüfung kann das Schema verwendet werden, um zukünftige Daten zu schützen, und hier erzeugte Anomalien können verwendet werden, um die Modellleistung zu debuggen, zu verstehen, wie sich Ihre Daten im Laufe der Zeit entwickeln, und Datenfehler zu identifizieren.

Verwandeln

Die Transform Komponente führt Feature - Engineering für Training und Servieren. Es nutzt die TensorFlow Transformation Bibliothek.

Transform werden aus den Daten als Eingabe ExampleGen , das Schema aus SchemaGen , sowie ein Modul , das Code - Transformation definiert benutzer enthält.

Mal sehen , ein Beispiel für benutzerdefinierten Code - Transformation unten (für eine Einführung in die TensorFlow Trans APIs finden Sie im Tutorial ). Zunächst definieren wir einige Konstanten für das Feature Engineering:

_taxi_constants_module_file = 'taxi_constants.py'
%%writefile {_taxi_constants_module_file}

# Categorical features are assumed to each have a maximum value in the dataset.
MAX_CATEGORICAL_FEATURE_VALUES = [24, 31, 12]

CATEGORICAL_FEATURE_KEYS = [
    'trip_start_hour', 'trip_start_day', 'trip_start_month',
    'pickup_census_tract', 'dropoff_census_tract', 'pickup_community_area',
    'dropoff_community_area'
]

DENSE_FLOAT_FEATURE_KEYS = ['trip_miles', 'fare', 'trip_seconds']

# Number of buckets used by tf.transform for encoding each feature.
FEATURE_BUCKET_COUNT = 10

BUCKET_FEATURE_KEYS = [
    'pickup_latitude', 'pickup_longitude', 'dropoff_latitude',
    'dropoff_longitude'
]

# Number of vocabulary terms used for encoding VOCAB_FEATURES by tf.transform
VOCAB_SIZE = 1000

# Count of out-of-vocab buckets in which unrecognized VOCAB_FEATURES are hashed.
OOV_SIZE = 10

VOCAB_FEATURE_KEYS = [
    'payment_type',
    'company',
]

# Keys
LABEL_KEY = 'tips'
FARE_KEY = 'fare'

def transformed_name(key):
  return key + '_xf'
Writing taxi_constants.py

Als nächstes schreiben wir eine preprocessing_fn , die in Rohdaten als Eingabe und kehrt transformierten Merkmale , dass unser Modell auf trainieren kann:

_taxi_transform_module_file = 'taxi_transform.py'
%%writefile {_taxi_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_FARE_KEY = taxi_constants.FARE_KEY
_LABEL_KEY = taxi_constants.LABEL_KEY
_transformed_name = taxi_constants.transformed_name


def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.
  Args:
    inputs: map from feature keys to raw not-yet-transformed features.
  Returns:
    Map from string feature key to transformed feature operations.
  """
  outputs = {}
  for key in _DENSE_FLOAT_FEATURE_KEYS:
    # Preserve this feature as a dense float, setting nan's to the mean.
    outputs[_transformed_name(key)] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  for key in _VOCAB_FEATURE_KEYS:
    # Build a vocabulary for this feature.
    outputs[_transformed_name(key)] = tft.compute_and_apply_vocabulary(
        _fill_in_missing(inputs[key]),
        top_k=_VOCAB_SIZE,
        num_oov_buckets=_OOV_SIZE)

  for key in _BUCKET_FEATURE_KEYS:
    outputs[_transformed_name(key)] = tft.bucketize(
        _fill_in_missing(inputs[key]), _FEATURE_BUCKET_COUNT)

  for key in _CATEGORICAL_FEATURE_KEYS:
    outputs[_transformed_name(key)] = _fill_in_missing(inputs[key])

  # Was this passenger a big tipper?
  taxi_fare = _fill_in_missing(inputs[_FARE_KEY])
  tips = _fill_in_missing(inputs[_LABEL_KEY])
  outputs[_transformed_name(_LABEL_KEY)] = tf.where(
      tf.math.is_nan(taxi_fare),
      tf.cast(tf.zeros_like(taxi_fare), tf.int64),
      # Test if the tip was > 20% of the fare.
      tf.cast(
          tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))

  return outputs


def _fill_in_missing(x):
  """Replace missing values in a SparseTensor.
  Fills in missing values of `x` with '' or 0, and converts to a dense tensor.
  Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 1
      in the second dimension.
  Returns:
    A rank 1 tensor where missing values of `x` have been filled in.
  """
  if not isinstance(x, tf.sparse.SparseTensor):
    return x

  default_value = '' if x.dtype == tf.string else 0
  return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)
Writing taxi_transform.py

Nun gehen wir in dieser Funktion Engineering Code an die Transform Komponente und führen Sie es , Ihre Daten zu transformieren.

transform = tfx.components.Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_gen.outputs['schema'],
    module_file=os.path.abspath(_taxi_transform_module_file))
context.run(transform)
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_transform.py' (including modules: ['taxi_constants', 'taxi_transform']).
INFO:absl:User module package has hash fingerprint version ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmp/tmpoqmg142s/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmpanjcjub4', '--dist-dir', '/tmp/tmpdsj16ttp']
INFO:absl:Successfully built user code wheel distribution at '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl'; target user module is 'taxi_transform'.
INFO:absl:Full user module path is 'taxi_transform@/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl'
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
2021-07-27 09:07:50.744965: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Running executor for Transform
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_constants.py -> build/lib
copying taxi_transform.py -> build/lib
installing to /tmp/tmpanjcjub4
running install
running install_lib
copying build/lib/taxi_constants.py -> /tmp/tmpanjcjub4
copying build/lib/taxi_transform.py -> /tmp/tmpanjcjub4
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmp/tmpanjcjub4/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3.7.egg-info
running install_scripts
creating /tmp/tmpanjcjub4/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f.dist-info/WHEEL
creating '/tmp/tmpdsj16ttp/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl' and adding '/tmp/tmpanjcjub4' to it
adding 'taxi_constants.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f.dist-info/METADATA'
adding 'tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f.dist-info/WHEEL'
adding 'tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f.dist-info/top_level.txt'
adding 'tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f.dist-info/RECORD'
removing /tmp/tmpanjcjub4
2021-07-27 09:07:50.749018: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Analyze the 'train' split and transform all splits when splits_config is not set.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
INFO:absl:Installing '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpgm2vzpks', '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl']
Processing /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl
WARNING: You are using pip version 21.1.3; however, version 21.2.1 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl', 'stats_options_updater_fn': None} 'stats_options_updater_fn'
INFO:absl:Installing '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpgvos8jkh', '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f
Processing /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl
WARNING: You are using pip version 21.1.3; however, version 21.2.1 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl'.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Installing '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpfzcjc_89', '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl']
Processing /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl
WARNING: You are using pip version 21.1.3; however, version 21.2.1 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f-py3-none-any.whl'.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+ba15fceb350294024553cb2f31d9929992f91dcaa3af4f05811c926d31c25e8f
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
2021-07-27 09:08:04.817364: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/transform_graph/5/.temp_path/tftransform_tmp/da700272a3d54e20a2b7ccdc18a4fecc/assets
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/transform_graph/5/.temp_path/tftransform_tmp/688d8c4944014a3cb6680dd2e0499418/assets
INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized

Lassen Sie uns die Ausgabe Artefakte untersuchen Transform . Diese Komponente erzeugt zwei Arten von Ausgaben:

  • transform_graph ist der Graph, der die Vorverarbeitung Operationen durchführen kann (Dieser Graph wird in der bedienenden und Bewertungsmodelle enthalten sein).
  • transformed_examples stellt die vorverarbeiteten Ausbildung und Bewertungsdaten.
transform.outputs
{'transform_graph': Channel(
     type_name: TransformGraph
     artifacts: [Artifact(artifact: id: 5
 type_id: 13
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/transform_graph/5"
 custom_properties {
   key: "name"
   value {
     string_value: "transform_graph"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 13
 name: "TransformGraph"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'transformed_examples': Channel(
     type_name: Examples
     artifacts: [Artifact(artifact: id: 6
 type_id: 5
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/transformed_examples/5"
 properties {
   key: "split_names"
   value {
     string_value: "[\"train\", \"eval\"]"
   }
 }
 custom_properties {
   key: "name"
   value {
     string_value: "transformed_examples"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 5
 name: "Examples"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 properties {
   key: "version"
   value: INT
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'updated_analyzer_cache': Channel(
     type_name: TransformCache
     artifacts: [Artifact(artifact: id: 7
 type_id: 14
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/updated_analyzer_cache/5"
 custom_properties {
   key: "name"
   value {
     string_value: "updated_analyzer_cache"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 14
 name: "TransformCache"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'pre_transform_schema': Channel(
     type_name: Schema
     artifacts: [Artifact(artifact: id: 8
 type_id: 9
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/pre_transform_schema/5"
 custom_properties {
   key: "name"
   value {
     string_value: "pre_transform_schema"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 9
 name: "Schema"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'pre_transform_stats': Channel(
     type_name: ExampleStatistics
     artifacts: [Artifact(artifact: id: 9
 type_id: 7
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/pre_transform_stats/5"
 custom_properties {
   key: "name"
   value {
     string_value: "pre_transform_stats"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 7
 name: "ExampleStatistics"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'post_transform_schema': Channel(
     type_name: Schema
     artifacts: [Artifact(artifact: id: 10
 type_id: 9
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/post_transform_schema/5"
 custom_properties {
   key: "name"
   value {
     string_value: "post_transform_schema"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 9
 name: "Schema"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'post_transform_stats': Channel(
     type_name: ExampleStatistics
     artifacts: [Artifact(artifact: id: 11
 type_id: 7
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/post_transform_stats/5"
 custom_properties {
   key: "name"
   value {
     string_value: "post_transform_stats"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 7
 name: "ExampleStatistics"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'post_transform_anomalies': Channel(
     type_name: ExampleAnomalies
     artifacts: [Artifact(artifact: id: 12
 type_id: 11
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Transform/post_transform_anomalies/5"
 custom_properties {
   key: "name"
   value {
     string_value: "post_transform_anomalies"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 11
 name: "ExampleAnomalies"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 )}

Werfen Sie einen Blick auf die transform_graph Artefakt. Es zeigt auf ein Verzeichnis, das drei Unterverzeichnisse enthält.

train_uri = transform.outputs['transform_graph'].get()[0].uri
os.listdir(train_uri)
['transform_fn', 'transformed_metadata', 'metadata']

Das transformed_metadata Unterverzeichnis enthält das Schema der vorverarbeiteten Daten. Das transform_fn Unterverzeichnis enthält die eigentliche Vorverarbeitung Graph. Das metadata - Unterverzeichnis enthält das Schema der ursprünglichen Daten.

Wir können uns auch die ersten drei transformierten Beispiele ansehen:

# Get the URI of the output artifact representing the transformed examples, which is a directory
train_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)
features {
  feature {
    key: "company_xf"
    value {
      int64_list {
        value: 8
      }
    }
  }
  feature {
    key: "dropoff_census_tract_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude_xf"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare_xf"
    value {
      float_list {
        value: 0.061060599982738495
      }
    }
  }
  feature {
    key: "payment_type_xf"
    value {
      int64_list {
        value: 1
      }
    }
  }
  feature {
    key: "pickup_census_tract_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_latitude_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude_xf"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "tips_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles_xf"
    value {
      float_list {
        value: -0.15886740386486053
      }
    }
  }
  feature {
    key: "trip_seconds_xf"
    value {
      float_list {
        value: -0.7118487358093262
      }
    }
  }
  feature {
    key: "trip_start_day_xf"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour_xf"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month_xf"
    value {
      int64_list {
        value: 5
      }
    }
  }
}

features {
  feature {
    key: "company_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_census_tract_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude_xf"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare_xf"
    value {
      float_list {
        value: 1.2521240711212158
      }
    }
  }
  feature {
    key: "payment_type_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area_xf"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude_xf"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "tips_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles_xf"
    value {
      float_list {
        value: 0.532160758972168
      }
    }
  }
  feature {
    key: "trip_seconds_xf"
    value {
      float_list {
        value: 0.5509493350982666
      }
    }
  }
  feature {
    key: "trip_start_day_xf"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour_xf"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month_xf"
    value {
      int64_list {
        value: 10
      }
    }
  }
}

features {
  feature {
    key: "company_xf"
    value {
      int64_list {
        value: 48
      }
    }
  }
  feature {
    key: "dropoff_census_tract_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude_xf"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare_xf"
    value {
      float_list {
        value: 0.3873794376850128
      }
    }
  }
  feature {
    key: "payment_type_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area_xf"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude_xf"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "pickup_longitude_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "tips_xf"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles_xf"
    value {
      float_list {
        value: 0.21955278515815735
      }
    }
  }
  feature {
    key: "trip_seconds_xf"
    value {
      float_list {
        value: 0.0019067146349698305
      }
    }
  }
  feature {
    key: "trip_start_day_xf"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour_xf"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month_xf"
    value {
      int64_list {
        value: 11
      }
    }
  }
}

Nach der Transform Komponente Ihrer Daten in Funktionen umgewandelt hat, und der nächste Schritt ist es, ein Modell zu trainieren.

Trainer

Der Trainer Komponente wird ein Modell trainieren , dass Sie in TensorFlow definieren. Standard Trainer Unterstützung Estimator API, Keras API zu verwenden, müssen Sie angeben , Allgemein Trainer von Setup custom_executor_spec=executor_spec.ExecutorClassSpec(GenericExecutor) in Trainer contructor.

Trainer als Eingabe das Schema aus SchemaGen , die transformierten Daten und einem Graph von Transform - Parameter Ausbildung, sowie ein Modul , das benutzerdefinierte Modellcode enthält.

Mal sehen , ein Beispiel für benutzerdefinierten Modell Code (für eine Einführung in die TensorFlow Keras APIs finden Sie im Tutorial ):

_taxi_trainer_module_file = 'taxi_trainer.py'
%%writefile {_taxi_trainer_module_file}

from typing import List, Text

import os
import absl
import datetime
import tensorflow as tf
import tensorflow_transform as tft

from tfx import v1 as tfx
from tfx_bsl.public import tfxio

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_MAX_CATEGORICAL_FEATURE_VALUES = taxi_constants.MAX_CATEGORICAL_FEATURE_VALUES
_LABEL_KEY = taxi_constants.LABEL_KEY
_transformed_name = taxi_constants.transformed_name


def _transformed_names(keys):
  return [_transformed_name(key) for key in keys]


def _get_serve_tf_examples_fn(model, tf_transform_output):
  """Returns a function that parses a serialized tf.Example and applies TFT."""

  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Returns the output to be used in the serving signature."""
    feature_spec = tf_transform_output.raw_feature_spec()
    feature_spec.pop(_LABEL_KEY)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)
    return model(transformed_features)

  return serve_tf_examples_fn


def _input_fn(file_pattern: List[Text],
              data_accessor: tfx.components.DataAccessor,
              tf_transform_output: tft.TFTransformOutput,
              batch_size: int = 200) -> tf.data.Dataset:
  """Generates features and label for tuning/training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_transformed_name(_LABEL_KEY)),
      tf_transform_output.transformed_metadata.schema)


def _build_keras_model(hidden_units: List[int] = None) -> tf.keras.Model:
  """Creates a DNN Keras model for classifying taxi data.

  Args:
    hidden_units: [int], the layer sizes of the DNN (input layer first).

  Returns:
    A keras Model.
  """
  real_valued_columns = [
      tf.feature_column.numeric_column(key, shape=())
      for key in _transformed_names(_DENSE_FLOAT_FEATURE_KEYS)
  ]
  categorical_columns = [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_VOCAB_SIZE + _OOV_SIZE, default_value=0)
      for key in _transformed_names(_VOCAB_FEATURE_KEYS)
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_FEATURE_BUCKET_COUNT, default_value=0)
      for key in _transformed_names(_BUCKET_FEATURE_KEYS)
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(  # pylint: disable=g-complex-comprehension
          key,
          num_buckets=num_buckets,
          default_value=0) for key, num_buckets in zip(
              _transformed_names(_CATEGORICAL_FEATURE_KEYS),
              _MAX_CATEGORICAL_FEATURE_VALUES)
  ]
  indicator_column = [
      tf.feature_column.indicator_column(categorical_column)
      for categorical_column in categorical_columns
  ]

  model = _wide_and_deep_classifier(
      # TODO(b/139668410) replace with premade wide_and_deep keras model
      wide_columns=indicator_column,
      deep_columns=real_valued_columns,
      dnn_hidden_units=hidden_units or [100, 70, 50, 25])
  return model


def _wide_and_deep_classifier(wide_columns, deep_columns, dnn_hidden_units):
  """Build a simple keras wide and deep model.

  Args:
    wide_columns: Feature columns wrapped in indicator_column for wide (linear)
      part of the model.
    deep_columns: Feature columns for deep part of the model.
    dnn_hidden_units: [int], the layer sizes of the hidden DNN.

  Returns:
    A Wide and Deep Keras model
  """
  # Following values are hard coded for simplicity in this example,
  # However prefarably they should be passsed in as hparams.

  # Keras needs the feature definitions at compile time.
  # TODO(b/139081439): Automate generation of input layers from FeatureColumn.
  input_layers = {
      colname: tf.keras.layers.Input(name=colname, shape=(), dtype=tf.float32)
      for colname in _transformed_names(_DENSE_FLOAT_FEATURE_KEYS)
  }
  input_layers.update({
      colname: tf.keras.layers.Input(name=colname, shape=(), dtype='int32')
      for colname in _transformed_names(_VOCAB_FEATURE_KEYS)
  })
  input_layers.update({
      colname: tf.keras.layers.Input(name=colname, shape=(), dtype='int32')
      for colname in _transformed_names(_BUCKET_FEATURE_KEYS)
  })
  input_layers.update({
      colname: tf.keras.layers.Input(name=colname, shape=(), dtype='int32')
      for colname in _transformed_names(_CATEGORICAL_FEATURE_KEYS)
  })

  # TODO(b/161952382): Replace with Keras preprocessing layers.
  deep = tf.keras.layers.DenseFeatures(deep_columns)(input_layers)
  for numnodes in dnn_hidden_units:
    deep = tf.keras.layers.Dense(numnodes)(deep)
  wide = tf.keras.layers.DenseFeatures(wide_columns)(input_layers)

  output = tf.keras.layers.Dense(1)(
          tf.keras.layers.concatenate([deep, wide]))

  model = tf.keras.Model(input_layers, output)
  model.compile(
      loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
      optimizer=tf.keras.optimizers.Adam(lr=0.001),
      metrics=[tf.keras.metrics.BinaryAccuracy()])
  model.summary(print_fn=absl.logging.info)
  return model


# TFX Trainer will call this function.
def run_fn(fn_args: tfx.components.FnArgs):
  """Train the model based on given args.

  Args:
    fn_args: Holds args used to train the model as name/value pairs.
  """
  # Number of nodes in the first layer of the DNN
  first_dnn_layer_size = 100
  num_dnn_layers = 4
  dnn_decay_factor = 0.7

  tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)

  train_dataset = _input_fn(fn_args.train_files, fn_args.data_accessor, 
                            tf_transform_output, 40)
  eval_dataset = _input_fn(fn_args.eval_files, fn_args.data_accessor, 
                           tf_transform_output, 40)

  model = _build_keras_model(
      # Construct layers sizes with exponetial decay
      hidden_units=[
          max(2, int(first_dnn_layer_size * dnn_decay_factor**i))
          for i in range(num_dnn_layers)
      ])

  tensorboard_callback = tf.keras.callbacks.TensorBoard(
      log_dir=fn_args.model_run_dir, update_freq='batch')
  model.fit(
      train_dataset,
      steps_per_epoch=fn_args.train_steps,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps,
      callbacks=[tensorboard_callback])

  signatures = {
      'serving_default':
          _get_serve_tf_examples_fn(model,
                                    tf_transform_output).get_concrete_function(
                                        tf.TensorSpec(
                                            shape=[None],
                                            dtype=tf.string,
                                            name='examples')),
  }
  model.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)
Writing taxi_trainer.py

Nun gehen wir in diesem Modell Code an die Trainer - Komponente und führen Sie sich um das Modell zu trainieren.

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(_taxi_trainer_module_file),
    examples=transform.outputs['transformed_examples'],
    transform_graph=transform.outputs['transform_graph'],
    schema=schema_gen.outputs['schema'],
    train_args=tfx.proto.TrainArgs(num_steps=10000),
    eval_args=tfx.proto.EvalArgs(num_steps=5000))
context.run(trainer)
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py' (including modules: ['taxi_constants', 'taxi_trainer', 'taxi_transform']).
INFO:absl:User module package has hash fingerprint version 3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmp/tmp_lnvffb5/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmpc26kvw4n', '--dist-dir', '/tmp/tmpjsv1y_4u']
INFO:absl:Successfully built user code wheel distribution at '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl'; target user module is 'taxi_trainer'.
INFO:absl:Full user module path is 'taxi_trainer@/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl'
INFO:absl:Running driver for Trainer
INFO:absl:MetadataStore with DB connection initialized
2021-07-27 09:08:19.212010: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Running executor for Trainer
2021-07-27 09:08:19.215726: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
INFO:absl:udf_utils.get_fn {'train_args': '{\n  "num_steps": 10000\n}', 'eval_args': '{\n  "num_steps": 5000\n}', 'module_file': None, 'run_fn': None, 'trainer_fn': None, 'custom_config': 'null', 'module_path': 'taxi_trainer@/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl'} 'run_fn'
INFO:absl:Installing '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp7xh_pre7', '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl']
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_constants.py -> build/lib
copying taxi_trainer.py -> build/lib
copying taxi_transform.py -> build/lib
installing to /tmp/tmpc26kvw4n
running install
running install_lib
copying build/lib/taxi_constants.py -> /tmp/tmpc26kvw4n
copying build/lib/taxi_transform.py -> /tmp/tmpc26kvw4n
copying build/lib/taxi_trainer.py -> /tmp/tmpc26kvw4n
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmpc26kvw4n/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3.7.egg-info
running install_scripts
creating /tmp/tmpc26kvw4n/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7.dist-info/WHEEL
creating '/tmp/tmpjsv1y_4u/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl' and adding '/tmp/tmpc26kvw4n' to it
adding 'taxi_constants.py'
adding 'taxi_trainer.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7.dist-info/RECORD'
removing /tmp/tmpc26kvw4n
Processing /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl
WARNING: You are using pip version 21.1.3; however, version 21.2.1 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/_wheels/tfx_user_code_Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7-py3-none-any.whl'.
INFO:absl:Training model.
INFO:absl:Feature company_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature fare_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature tips_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month_xf has a shape . Setting to DenseTensor.
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+3acd02058a78fc9e40d70144d392b74161d6b10802fdd25a94793cf0145193b7
INFO:absl:Feature company_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature fare_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature tips_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature company_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature fare_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature tips_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature company_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature fare_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature tips_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour_xf has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month_xf has a shape . Setting to DenseTensor.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  "The `lr` argument is deprecated, use `learning_rate` instead.")
INFO:absl:Model: "model"
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Layer (type)                    Output Shape         Param #     Connected to                     
INFO:absl:==================================================================================================
INFO:absl:company_xf (InputLayer)         [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dropoff_census_tract_xf (InputL [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dropoff_community_area_xf (Inpu [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dropoff_latitude_xf (InputLayer [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dropoff_longitude_xf (InputLaye [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:fare_xf (InputLayer)            [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:payment_type_xf (InputLayer)    [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:pickup_census_tract_xf (InputLa [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:pickup_community_area_xf (Input [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:pickup_latitude_xf (InputLayer) [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:pickup_longitude_xf (InputLayer [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:trip_miles_xf (InputLayer)      [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:trip_seconds_xf (InputLayer)    [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:trip_start_day_xf (InputLayer)  [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:trip_start_hour_xf (InputLayer) [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:trip_start_month_xf (InputLayer [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_features (DenseFeatures)  (None, 3)            0           company_xf[0][0]                 
INFO:absl:                                                                 dropoff_census_tract_xf[0][0]    
INFO:absl:                                                                 dropoff_community_area_xf[0][0]  
INFO:absl:                                                                 dropoff_latitude_xf[0][0]        
INFO:absl:                                                                 dropoff_longitude_xf[0][0]       
INFO:absl:                                                                 fare_xf[0][0]                    
INFO:absl:                                                                 payment_type_xf[0][0]            
INFO:absl:                                                                 pickup_census_tract_xf[0][0]     
INFO:absl:                                                                 pickup_community_area_xf[0][0]   
INFO:absl:                                                                 pickup_latitude_xf[0][0]         
INFO:absl:                                                                 pickup_longitude_xf[0][0]        
INFO:absl:                                                                 trip_miles_xf[0][0]              
INFO:absl:                                                                 trip_seconds_xf[0][0]            
INFO:absl:                                                                 trip_start_day_xf[0][0]          
INFO:absl:                                                                 trip_start_hour_xf[0][0]         
INFO:absl:                                                                 trip_start_month_xf[0][0]        
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense (Dense)                   (None, 100)          400         dense_features[0][0]             
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_1 (Dense)                 (None, 70)           7070        dense[0][0]                      
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_2 (Dense)                 (None, 48)           3408        dense_1[0][0]                    
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_3 (Dense)                 (None, 34)           1666        dense_2[0][0]                    
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_features_1 (DenseFeatures (None, 2127)         0           company_xf[0][0]                 
INFO:absl:                                                                 dropoff_census_tract_xf[0][0]    
INFO:absl:                                                                 dropoff_community_area_xf[0][0]  
INFO:absl:                                                                 dropoff_latitude_xf[0][0]        
INFO:absl:                                                                 dropoff_longitude_xf[0][0]       
INFO:absl:                                                                 fare_xf[0][0]                    
INFO:absl:                                                                 payment_type_xf[0][0]            
INFO:absl:                                                                 pickup_census_tract_xf[0][0]     
INFO:absl:                                                                 pickup_community_area_xf[0][0]   
INFO:absl:                                                                 pickup_latitude_xf[0][0]         
INFO:absl:                                                                 pickup_longitude_xf[0][0]        
INFO:absl:                                                                 trip_miles_xf[0][0]              
INFO:absl:                                                                 trip_seconds_xf[0][0]            
INFO:absl:                                                                 trip_start_day_xf[0][0]          
INFO:absl:                                                                 trip_start_hour_xf[0][0]         
INFO:absl:                                                                 trip_start_month_xf[0][0]        
INFO:absl:__________________________________________________________________________________________________
INFO:absl:concatenate (Concatenate)       (None, 2161)         0           dense_3[0][0]                    
INFO:absl:                                                                 dense_features_1[0][0]           
INFO:absl:__________________________________________________________________________________________________
INFO:absl:dense_4 (Dense)                 (None, 1)            2162        concatenate[0][0]                
INFO:absl:==================================================================================================
INFO:absl:Total params: 14,706
INFO:absl:Trainable params: 14,706
INFO:absl:Non-trainable params: 0
INFO:absl:__________________________________________________________________________________________________
2021-07-27 09:08:21.580710: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-07-27 09:08:21.580762: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-07-27 09:08:21.580858: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1611] Profiler found 1 GPUs
2021-07-27 09:08:21.643763: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcupti.so.11.2
2021-07-27 09:08:21.847952: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.
2021-07-27 09:08:21.853003: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1743] CUPTI activity buffer flushed
2021-07-27 09:08:23.104739: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
1/10000 [..............................] - ETA: 4:30:47 - loss: 0.6922 - binary_accuracy: 0.7750
2021-07-27 09:08:23.544669: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-27 09:08:23.585375: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-07-27 09:08:23.585425: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
21/10000 [..............................] - ETA: 3:42 - loss: 0.6493 - binary_accuracy: 0.7821
2021-07-27 09:08:23.797730: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2021-07-27 09:08:23.800405: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1743] CUPTI activity buffer flushed
2021-07-27 09:08:23.833723: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673]  GpuTracer has collected 301 callback api events and 298 activity events. 
2021-07-27 09:08:23.841253: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.
2021-07-27 09:08:23.850367: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23
2021-07-27 09:08:23.857266: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.trace.json.gz
2021-07-27 09:08:23.876579: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23
2021-07-27 09:08:23.879755: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for memory_profile.json.gz to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.memory_profile.json.gz
2021-07-27 09:08:23.880489: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23Dumped tool data for xplane.pb to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.xplane.pb
Dumped tool data for overview_page.pb to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.overview_page.pb
Dumped tool data for input_pipeline.pb to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6/train/plugins/profile/2021_07_27_09_08_23/kokoro-gcp-ubuntu-prod-762616165.kernel_stats.pb
10000/10000 [==============================] - 83s 8ms/step - loss: 0.2376 - binary_accuracy: 0.8600 - val_loss: 0.2218 - val_binary_accuracy: 0.8729
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model/6/Format-Serving/assets
INFO:absl:Training complete. Model written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model/6/Format-Serving. ModelRun written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model_run/6
INFO:absl:Running publisher for Trainer
INFO:absl:MetadataStore with DB connection initialized

Analysieren Sie das Training mit TensorBoard

Werfen Sie einen Blick auf das Trainer-Artefakt. Es zeigt auf ein Verzeichnis, das die Modellunterverzeichnisse enthält.

model_artifact_dir = trainer.outputs['model'].get()[0].uri
pp.pprint(os.listdir(model_artifact_dir))
model_dir = os.path.join(model_artifact_dir, 'Format-Serving')
pp.pprint(os.listdir(model_dir))
['Format-Serving']
['variables', 'assets', 'keras_metadata.pb', 'saved_model.pb']

Optional können wir TensorBoard mit dem Trainer verbinden, um die Trainingskurven unseres Modells zu analysieren.

model_run_artifact_dir = trainer.outputs['model_run'].get()[0].uri

%load_ext tensorboard
%tensorboard --logdir {model_run_artifact_dir}

Bewerter

Die Evaluator - Komponente berechnet Modell Performance - Metriken über den Auswertsatz. Es nutzt die TensorFlow Modellanalyse Bibliothek. Der Evaluator kann optional auch bestätigen , dass ein neu ausgebildetes Modell besser ist als das Vorgängermodell. Dies ist in einer Produktionspipeline-Einstellung nützlich, in der Sie ein Modell täglich automatisch trainieren und validieren können. In diesem Notebook trainieren wir nur ein Modell, so dass der Evaluator automatisch das Modell als „gut“ bezeichnen.

Evaluator wird als Eingabe die Daten aus ExampleGen , das trainierte Modell aus Trainer und Slicing - Konfiguration. Mit der Slicing-Konfiguration können Sie Ihre Metriken auf Merkmalswerte aufteilen (z. B. wie verhält sich Ihr Modell bei Taxifahrten, die um 8 Uhr morgens beginnen, gegenüber 20 Uhr abends?). Sehen Sie unten ein Beispiel für diese Konfiguration:

eval_config = tfma.EvalConfig(
    model_specs=[
        # This assumes a serving model with signature 'serving_default'. If
        # using estimator based EvalSavedModel, add signature_name: 'eval' and 
        # remove the label_key.
        tfma.ModelSpec(label_key='tips')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            # The metrics added here are in addition to those saved with the
            # model (assuming either a keras model or EvalSavedModel is used).
            # Any metrics added into the saved model (for example using
            # model.compile(..., metrics=[...]), etc) will be computed
            # automatically.
            # To add validation thresholds for metrics saved with the model,
            # add them keyed by metric name to the thresholds map.
            metrics=[
                tfma.MetricConfig(class_name='ExampleCount'),
                tfma.MetricConfig(class_name='BinaryAccuracy',
                  threshold=tfma.MetricThreshold(
                      value_threshold=tfma.GenericValueThreshold(
                          lower_bound={'value': 0.5}),
                      # Change threshold will be ignored if there is no
                      # baseline model resolved from MLMD (first run).
                      change_threshold=tfma.GenericChangeThreshold(
                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                          absolute={'value': -1e-10})))
            ]
        )
    ],
    slicing_specs=[
        # An empty slice spec means the overall slice, i.e. the whole dataset.
        tfma.SlicingSpec(),
        # Data can be sliced along a feature column. In this case, data is
        # sliced along feature column trip_start_hour.
        tfma.SlicingSpec(feature_keys=['trip_start_hour'])
    ])

Als nächstes geben wir diese Konfiguration Evaluator und ausführen.

# Use TFMA to compute a evaluation statistics over features of a model and
# validate them against a baseline.

# The model resolver is only required if performing model validation in addition
# to evaluation. In this case we validate against the latest blessed model. If
# no model has been blessed before (as in this case) the evaluator will make our
# candidate the first blessed model.
model_resolver = tfx.dsl.Resolver(
      strategy_class=tfx.dsl.experimental.LatestBlessedModelStrategy,
      model=tfx.dsl.Channel(type=tfx.types.standard_artifacts.Model),
      model_blessing=tfx.dsl.Channel(
          type=tfx.types.standard_artifacts.ModelBlessing)).with_id(
              'latest_blessed_model_resolver')
context.run(model_resolver)

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    baseline_model=model_resolver.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)
INFO:absl:Running driver for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running publisher for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running driver for Evaluator
INFO:absl:MetadataStore with DB connection initialized
2021-07-27 09:09:55.670102: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Running executor for Evaluator
2021-07-27 09:09:55.673778: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Nonempty beam arg extra_packages already includes dependency
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        },\n        {\n          "class_name": "BinaryAccuracy",\n          "threshold": {\n            "change_threshold": {\n              "absolute": -1e-10,\n              "direction": "HIGHER_IS_BETTER"\n            },\n            "value_threshold": {\n              "lower_bound": 0.5\n            }\n          }\n        }\n      ]\n    }\n  ],\n  "model_specs": [\n    {\n      "label_key": "tips"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': None, 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_eval_shared_model'
ERROR:absl:There are change thresholds, but the baseline is missing. This is allowed only when rubber stamping (first run).
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  label_key: "tips"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  metrics {
    class_name: "BinaryAccuracy"
    threshold {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Using /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model/6/Format-Serving as  model.
Exception ignored in: <function CapturableResource.__del__ at 0x7f2a1f16d9e0>
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/tracking/tracking.py", line 277, in __del__
    self._destroy_resource()
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 924, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3022, in __call__
    filtered_flat_args) = self._maybe_define_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
AttributeError: 'NoneType' object has no attribute '__wrapped__'
Exception ignored in: <function CapturableResource.__del__ at 0x7f2a1f16d9e0>
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/tracking/tracking.py", line 277, in __del__
    self._destroy_resource()
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 924, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3022, in __call__
    filtered_flat_args) = self._maybe_define_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
AttributeError: 'NoneType' object has no attribute '__wrapped__'
Exception ignored in: <function CapturableResource.__del__ at 0x7f2a1f16d9e0>
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/tracking/tracking.py", line 277, in __del__
    self._destroy_resource()
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 924, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3022, in __call__
    filtered_flat_args) = self._maybe_define_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
AttributeError: 'NoneType' object has no attribute '__wrapped__'
Exception ignored in: <function CapturableResource.__del__ at 0x7f2a1f16d9e0>
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/tracking/tracking.py", line 277, in __del__
    self._destroy_resource()
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 924, in _call
    results = self._stateful_fn(*args, **kwds)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3022, in __call__
    filtered_flat_args) = self._maybe_define_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
AttributeError: 'NoneType' object has no attribute '__wrapped__'
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f2a90368b90> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f29e04590d0>).
INFO:absl:The 'example_splits' parameter is not set, using 'eval' split.
INFO:absl:Evaluating model.
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        },\n        {\n          "class_name": "BinaryAccuracy",\n          "threshold": {\n            "change_threshold": {\n              "absolute": -1e-10,\n              "direction": "HIGHER_IS_BETTER"\n            },\n            "value_threshold": {\n              "lower_bound": 0.5\n            }\n          }\n        }\n      ]\n    }\n  ],\n  "model_specs": [\n    {\n      "label_key": "tips"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': None, 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_extractors'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  label_key: "tips"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  metrics {
    class_name: "BinaryAccuracy"
    threshold {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
  model_names: ""
}

INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  label_key: "tips"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  metrics {
    class_name: "BinaryAccuracy"
    threshold {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
  model_names: ""
}

INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  label_key: "tips"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  metrics {
    class_name: "BinaryAccuracy"
    threshold {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
  model_names: ""
}
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f29e3987050> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f29e011d590>).
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f29c83c43d0> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f29c8161ed0>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f295867efd0> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f295868bd10>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f2526140410> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f2526309cd0>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f29587df150> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f2a902f1490>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f2960342a10> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f2a63a7af10>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.

Two checkpoint references resolved to different objects (<tensorflow.python.keras.saving.saved_model.load.TensorFlowTransform>TransformFeaturesLayer object at 0x7f250d20c090> and <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f250d230fd0>).
INFO:absl:Evaluation complete. Results written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Evaluator/evaluation/8.
INFO:absl:Checking validation results.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:113: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:absl:Blessing result True written to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Evaluator/blessing/8.
INFO:absl:Running publisher for Evaluator
INFO:absl:MetadataStore with DB connection initialized

Lassen Sie sich nun den Ausgang Artefakte untersuchen Evaluator .

evaluator.outputs
{'evaluation': Channel(
     type_name: ModelEvaluation
     artifacts: [Artifact(artifact: id: 15
 type_id: 20
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Evaluator/evaluation/8"
 custom_properties {
   key: "name"
   value {
     string_value: "evaluation"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Evaluator"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 20
 name: "ModelEvaluation"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'blessing': Channel(
     type_name: ModelBlessing
     artifacts: [Artifact(artifact: id: 16
 type_id: 21
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Evaluator/blessing/8"
 custom_properties {
   key: "blessed"
   value {
     int_value: 1
   }
 }
 custom_properties {
   key: "current_model"
   value {
     string_value: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Trainer/model/6"
   }
 }
 custom_properties {
   key: "current_model_id"
   value {
     int_value: 13
   }
 }
 custom_properties {
   key: "name"
   value {
     string_value: "blessing"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Evaluator"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 21
 name: "ModelBlessing"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 )}

Die Verwendung von evaluation Ausgabe können wir die Standard - Visualisierung der globalen Metriken auf der gesamten Auswertsatz zeigen.

context.show(evaluator.outputs['evaluation'])

Um die Visualisierung für aufgeteilte Bewertungsmetriken anzuzeigen, können wir direkt die TensorFlow-Modellanalysebibliothek aufrufen.

import tensorflow_model_analysis as tfma

# Get the TFMA output result path and load the result.
PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
tfma_result = tfma.load_eval_result(PATH_TO_RESULT)

# Show data sliced along feature column trip_start_hour.
tfma.view.render_slicing_metrics(
    tfma_result, slicing_column='trip_start_hour')
SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'trip_start_hour:19',…

Diese Visualisierung zeigt die gleichen Metriken, sondern bei jedem Merkmalswert berechnet trip_start_hour statt auf dem gesamten Auswertsatz.

Die TensorFlow-Modellanalyse unterstützt viele andere Visualisierungen, z. B. Fairness-Indikatoren und das Zeichnen einer Zeitreihe der Modellleistung. Um mehr zu erfahren, finden Sie das Tutorial .

Da wir unserer Konfiguration Schwellenwerte hinzugefügt haben, ist auch eine Validierungsausgabe verfügbar. Die precence eines blessing Artefakt zeigt an, dass unsere Modellvalidierung übergeben. Da dies die erste Validierung ist, die durchgeführt wird, wird der Kandidat automatisch gesegnet.

blessing_uri = evaluator.outputs['blessing'].get()[0].uri
!ls -l {blessing_uri}
total 0
-rw-rw-r-- 1 kbuilder kbuilder 0 Jul 27 09:10 BLESSED

Jetzt können Sie den Erfolg auch überprüfen, indem Sie den Validierungsergebnissatz laden:

PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
print(tfma.load_validation_result(PATH_TO_RESULT))
validation_ok: true
validation_details {
  slicing_details {
    slicing_spec {
    }
    num_matching_slices: 25
  }
}

Pusher

Die Pusher - Komponente ist in der Regel am Ende einer TFX - Pipeline. Es wird überprüft , ob eine Modellvalidierung bestanden hat, und wenn ja, die Exporte um das Modell zu _serving_model_dir .

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
context.run(pusher)
INFO:absl:Running driver for Pusher
INFO:absl:MetadataStore with DB connection initialized
2021-07-27 09:10:20.780757: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:623] No property is defined for the Type
INFO:absl:Running executor for Pusher
INFO:absl:Model version: 1627377020
INFO:absl:Model written to serving path /tmp/tmpf2y8jc9r/serving_model/taxi_simple/1627377020.
INFO:absl:Model pushed to /tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Pusher/pushed_model/9.
INFO:absl:Running publisher for Pusher
INFO:absl:MetadataStore with DB connection initialized

Betrachten sie den Ausgang Artefakte von Pusher .

pusher.outputs
{'pushed_model': Channel(
     type_name: PushedModel
     artifacts: [Artifact(artifact: id: 17
 type_id: 23
 uri: "/tmp/tfx-interactive-2021-07-27T09_07_38.527065-m86gazca/Pusher/pushed_model/9"
 custom_properties {
   key: "name"
   value {
     string_value: "pushed_model"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Pusher"
   }
 }
 custom_properties {
   key: "pushed"
   value {
     int_value: 1
   }
 }
 custom_properties {
   key: "pushed_destination"
   value {
     string_value: "/tmp/tmpf2y8jc9r/serving_model/taxi_simple/1627377020"
   }
 }
 custom_properties {
   key: "pushed_version"
   value {
     string_value: "1627377020"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.0.0"
   }
 }
 state: LIVE
 , artifact_type: id: 23
 name: "PushedModel"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 )}

Insbesondere exportiert der Pusher Ihr Modell im SavedModel-Format, das wie folgt aussieht:

push_uri = pusher.outputs['pushed_model'].get()[0].uri
model = tf.saved_model.load(push_uri)

for item in model.signatures.items():
  pp.pprint(item)
('serving_default',
 <ConcreteFunction signature_wrapper(*, examples) at 0x7F250C448490>)

Wir sind mit unserer Tour durch eingebaute TFX-Komponenten fertig!