ヘルプKaggleにTensorFlowグレートバリアリーフを保護チャレンジに参加

TFXEstimatorコンポーネントチュートリアル

TensorFlow Extended(TFX)のコンポーネントごとの概要

このColabベースのチュートリアルでは、TensorFlow Extended(TFX)の各組み込みコンポーネントをインタラクティブに説明します。

データの取り込みからモデルのプッシュ、サービス提供まで、エンドツーエンドの機械学習パイプラインのすべてのステップをカバーしています。

完了すると、このノートブックのコンテンツがTFXパイプラインソースコードとして自動的にエクスポートされ、ApacheAirflowおよびApacheBeamとオーケストレーションできます。

バックグラウンド

このノートブックは、Jupyter / Colab環境でTFXを使用する方法を示しています。ここでは、インタラクティブなノートブックでシカゴのタクシーの例を見ていきます。

インタラクティブなノートブックで作業することは、TFXパイプラインの構造に精通するための便利な方法です。軽量の開発環境として独自のパイプラインの開発を行う場合にも役立ちますが、インタラクティブノートブックの編成方法とメタデータアーティファクトへのアクセス方法には違いがあることに注意する必要があります。

オーケストレーション

TFXの本番デプロイメントでは、Apache Airflow、Kubeflow Pipelines、Apache Beamなどのオーケストレーターを使用して、TFXコンポーネントの事前定義されたパイプライングラフをオーケストします。インタラクティブノートブックでは、ノートブック自体がオーケストレーターであり、ノートブックセルを実行するときに各TFXコンポーネントを実行します。

メタデータ

TFXの本番デプロイメントでは、MLメタデータ(MLMD)APIを介してメタデータにアクセスします。 MLMDは、メタデータプロパティをMySQLやSQLiteなどのデータベースに保存し、メタデータペイロードをファイルシステムなどの永続ストアに保存します。インタラクティブノートに、特性とペイロードの両方がで短命SQLiteデータベースに格納されている/tmp Jupyterノートまたはコラボサーバ上のディレクトリ。

設定

まず、必要なパッケージをインストールしてインポートし、パスを設定して、データをダウンロードします。

アップグレードピップ

ローカルで実行しているときにシステムでPipをアップグレードしないようにするには、Colabで実行していることを確認してください。もちろん、ローカルシステムは個別にアップグレードできます。

try:
  import colab
  !pip install --upgrade pip
except:
  pass

TFXをインストールする

pip install -U tfx

ランタイムを再起動しましたか?

上記のセルを初めて実行するときにGoogleColabを使用している場合は、ランタイムを再起動する必要があります([ランタイム]> [ランタイムの再起動...])。これは、Colabがパッケージをロードする方法が原因です。

パッケージをインポートする

標準のTFXコンポーネントクラスを含む必要なパッケージをインポートします。

import os
import pprint
import tempfile
import urllib

import absl
import tensorflow as tf
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
pp = pprint.PrettyPrinter()

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip

ライブラリのバージョンを確認しましょう。

print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))
TensorFlow version: 2.6.2
TFX version: 1.4.0

パイプラインパスを設定する

# This is the root directory for your TFX pip package installation.
_tfx_root = tfx.__path__[0]

# This is the directory containing the TFX Chicago Taxi Pipeline example.
_taxi_root = os.path.join(_tfx_root, 'examples/chicago_taxi_pipeline')

# This is the path where your model will be pushed for serving.
_serving_model_dir = os.path.join(
    tempfile.mkdtemp(), 'serving_model/taxi_simple')

# Set up logging.
absl.logging.set_verbosity(absl.logging.INFO)

サンプルデータをダウンロードする

TFXパイプラインで使用するサンプルデータセットをダウンロードします。

私たちが使っているデータセットがあるタクシーデータセットをTripsのシカゴ市が発表しました。このデータセットの列は次のとおりです。

Pickup_community_area運賃trip_start_month
trip_start_hour trip_start_day trip_start_timestamp
Pickup_latitude Pickup_longitude dropoff_latitude
dropoff_longitude trip_miles Pickup_census_tract
dropoff_census_tract支払いタイプ会社
trip_seconds dropoff_community_areaチップ

このデータセットでは、我々は予測するモデル構築するtips旅行のを。

_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv'
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)
('/tmp/tfx-data6e4_3xo9/data.csv', <http.client.HTTPMessage at 0x7f1a7e8cfb10>)

CSVファイルをざっと見てみましょう。

head {_data_filepath}
pickup_community_area,fare,trip_start_month,trip_start_hour,trip_start_day,trip_start_timestamp,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_miles,pickup_census_tract,dropoff_census_tract,payment_type,company,trip_seconds,dropoff_community_area,tips
,12.45,5,19,6,1400269500,,,,,0.0,,,Credit Card,Chicago Elite Cab Corp. (Chicago Carriag,0,,0.0
,0,3,19,5,1362683700,,,,,0,,,Unknown,Chicago Elite Cab Corp.,300,,0
60,27.05,10,2,3,1380593700,41.836150155,-87.648787952,,,12.6,,,Cash,Taxi Affiliation Services,1380,,0.0
10,5.85,10,1,2,1382319000,41.985015101,-87.804532006,,,0.0,,,Cash,Taxi Affiliation Services,180,,0.0
14,16.65,5,7,5,1369897200,41.968069,-87.721559063,,,0.0,,,Cash,Dispatch Taxi Affiliation,1080,,0.0
13,16.45,11,12,3,1446554700,41.983636307,-87.723583185,,,6.9,,,Cash,,780,,0.0
16,32.05,12,1,1,1417916700,41.953582125,-87.72345239,,,15.4,,,Cash,,1200,,0.0
30,38.45,10,10,5,1444301100,41.839086906,-87.714003807,,,14.6,,,Cash,,2580,,0.0
11,14.65,1,1,3,1358213400,41.978829526,-87.771166703,,,5.81,,,Cash,,1080,,0.0

免責事項:このサイトは、元のソースであるシカゴ市の公式Webサイトwww.cityofchicago.orgから使用するために変更されたデータを使用するアプリケーションを提供します。シカゴ市は、このサイトで提供されるデータの内容、正確性、適時性、または完全性について一切の主張を行いません。このサイトで提供されるデータは、いつでも変更される可能性があります。当サイトで提供されているデータは、自己責任で使用されているものと理解しております。

InteractiveContextを作成します

最後に、InteractiveContextを作成します。これにより、このノートブックでTFXコンポーネントをインタラクティブに実行できます。

# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
context = InteractiveContext()
WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4 as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/metadata.sqlite.

TFXコンポーネントをインタラクティブに実行する

次のセルでは、TFXコンポーネントを1つずつ作成し、それぞれを実行して、出力アーティファクトを視覚化します。

ExampleGen

ExampleGenコンポーネントは、TFXパイプラインの開始時に通常です。そうなる:

  1. データをトレーニングセットと評価セットに分割します(デフォルトでは、2/3トレーニング+ 1/3評価)
  2. 変換データtf.Exampleフォーマット(詳細はこちらこちら
  3. データをコピーし_tfx_rootアクセスに他のコンポーネント用のディレクトリ

ExampleGen入力として、データソースへのパスを取ります。我々の場合には、これはある_data_rootダウンロードCSVを含むパス。

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
context.run(example_gen)
INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen
INFO:absl:Generating examples.
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
INFO:absl:Processing input csv data /tmp/tfx-data6e4_3xo9/* to TFExample.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized

のがの出力アーティファクト調べてみましょうExampleGen 。このコンポーネントは、トレーニング例と評価例の2つのアーティファクトを生成します。

artifact = example_gen.outputs['examples'].get()[0]
print(artifact.split_names, artifact.uri)
["train", "eval"] /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/CsvExampleGen/examples/1

また、最初の3つのトレーニング例を見ることができます。

# Get the URI of the output artifact representing the training examples, which is a directory
train_uri = os.path.join(example_gen.outputs['examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)
features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Chicago Elite Cab Corp. (Chicago Carriag"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 12.449999809265137
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Credit Card"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1400269500
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Taxi Affiliation Services"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 27.049999237060547
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.836151123046875
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.64878845214844
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 12.600000381469727
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 1380
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1380593700
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 16.450000762939453
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.98363494873047
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.72357940673828
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 6.900000095367432
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 780
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1446554700
      }
    }
  }
}

今というExampleGenデータを摂取終了している、次のステップは、データ分析です。

StatisticsGen

StatisticsGenデータ分析のためだけでなく、下流の構成要素で使用するデータセット以上のコンポーネントを計算する統計。これは、使用していますTensorFlowデータ検証用ライブラリを。

StatisticsGen 、入力として、我々はちょうど使用して摂取したデータセット取りExampleGen

statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/StatisticsGen/statistics/2/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/StatisticsGen/statistics/2/Split-eval.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:absl:Running publisher for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized

StatisticsGen実行が終了し、我々は、出力された統計情報を視覚化することができます。別のプロットで遊んでみてください!

context.show(statistics_gen.outputs['statistics'])

SchemaGen

SchemaGenコンポーネントは、データの統計情報に基づいてスキーマを生成します。 (スキーマは予想範囲、種類、およびデータセットの機能のプロパティを定義します。)また、使用していますTensorFlowデータ検証ライブラリを。

SchemaGen入力として、我々が発生したことを統計かかりますStatisticsGenデフォルトでトレーニングスプリットを見て、。

schema_gen = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(schema_gen)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1205 10:59:36.632395  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Running executor for SchemaGen
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/SchemaGen/schema/3/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized

SchemaGen実行が終了、私たちはテーブルとして生成されたスキーマを視覚化することができます。

context.show(schema_gen.outputs['schema'])

データセット内の各機能は、そのプロパティとともにスキーマテーブルの行として表示されます。スキーマは、ドメインとして示される、カテゴリ機能がとるすべての値もキャプチャします。

スキーマの詳細については、を参照SchemaGenのマニュアルを

ExampleValidator

ExampleValidatorコンポーネントは、スキーマで定義された予測に基づいて、あなたのデータの異常を検知します。また、使用していますTensorFlowデータ検証ライブラリを。

ExampleValidatorからの入力として統計を取るStatisticsGen 、およびからスキーマSchemaGen

example_validator = tfx.components.ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_gen.outputs['schema'])
context.run(example_validator)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for ExampleValidator
INFO:absl:Validating schema against the computed statistics for split train.
INFO:absl:Validation complete for split train. Anomalies written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/ExampleValidator/anomalies/4/Split-train.
INFO:absl:Validating schema against the computed statistics for split eval.
INFO:absl:Validation complete for split eval. Anomalies written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/ExampleValidator/anomalies/4/Split-eval.
INFO:absl:Running publisher for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized

ExampleValidator実行が終了、私たちはテーブルのように異常を視覚化することができます。

context.show(example_validator.outputs['anomalies'])

異常表では、異常がないことがわかります。これは、私たちが分析した最初のデータセットであり、スキーマがそれに合わせて調整されているため、私たちが期待するものです。このスキーマを確認する必要があります。予期しないものは、データの異常を意味します。確認したら、スキーマを使用して将来のデータを保護できます。ここで生成された異常を使用して、モデルのパフォーマンスをデバッグし、データが時間の経過とともにどのように変化するかを理解し、データエラーを特定できます。

変身

Transformコンポーネントを実行するには、トレーニングとサービス提供の両方のためのエンジニアリングを備えています。これは、使用していますTensorFlowトランスフォームライブラリ。

Transform入力としてデータ取るExampleGen 、からスキーマSchemaGen 、ならびにユーザ定義の変換コードを含むモジュールを。

レッツは、例を参照してください、ユーザー定義(、TensorFlowへの導入のためのAPIを変換し、以下のコードを変換するチュートリアルを参照してください)。まず、特徴工学の定数をいくつか定義します。

_taxi_constants_module_file = 'taxi_constants.py'
%%writefile {_taxi_constants_module_file}

# Categorical features are assumed to each have a maximum value in the dataset.
MAX_CATEGORICAL_FEATURE_VALUES = [24, 31, 12]

CATEGORICAL_FEATURE_KEYS = [
    'trip_start_hour', 'trip_start_day', 'trip_start_month',
    'pickup_census_tract', 'dropoff_census_tract', 'pickup_community_area',
    'dropoff_community_area'
]

DENSE_FLOAT_FEATURE_KEYS = ['trip_miles', 'fare', 'trip_seconds']

# Number of buckets used by tf.transform for encoding each feature.
FEATURE_BUCKET_COUNT = 10

BUCKET_FEATURE_KEYS = [
    'pickup_latitude', 'pickup_longitude', 'dropoff_latitude',
    'dropoff_longitude'
]

# Number of vocabulary terms used for encoding VOCAB_FEATURES by tf.transform
VOCAB_SIZE = 1000

# Count of out-of-vocab buckets in which unrecognized VOCAB_FEATURES are hashed.
OOV_SIZE = 10

VOCAB_FEATURE_KEYS = [
    'payment_type',
    'company',
]

# Keys
LABEL_KEY = 'tips'
FARE_KEY = 'fare'
Writing taxi_constants.py

次に、我々は書くpreprocessing_fn入力として生データを取り込み、戻り我々のモデルは、上で訓練することができ変換機能を:

_taxi_transform_module_file = 'taxi_transform.py'
%%writefile {_taxi_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_FARE_KEY = taxi_constants.FARE_KEY
_LABEL_KEY = taxi_constants.LABEL_KEY


def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.
  Args:
    inputs: map from feature keys to raw not-yet-transformed features.
  Returns:
    Map from string feature key to transformed feature operations.
  """
  outputs = {}
  for key in _DENSE_FLOAT_FEATURE_KEYS:
    # If sparse make it dense, setting nan's to 0 or '', and apply zscore.
    outputs[key] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  for key in _VOCAB_FEATURE_KEYS:
    # Build a vocabulary for this feature.
    outputs[key] = tft.compute_and_apply_vocabulary(
        _fill_in_missing(inputs[key]),
        top_k=_VOCAB_SIZE,
        num_oov_buckets=_OOV_SIZE)

  for key in _BUCKET_FEATURE_KEYS:
    outputs[key] = tft.bucketize(
        _fill_in_missing(inputs[key]), _FEATURE_BUCKET_COUNT)

  for key in _CATEGORICAL_FEATURE_KEYS:
    outputs[key] = _fill_in_missing(inputs[key])

  # Was this passenger a big tipper?
  taxi_fare = _fill_in_missing(inputs[_FARE_KEY])
  tips = _fill_in_missing(inputs[_LABEL_KEY])
  outputs[_LABEL_KEY] = tf.where(
      tf.math.is_nan(taxi_fare),
      tf.cast(tf.zeros_like(taxi_fare), tf.int64),
      # Test if the tip was > 20% of the fare.
      tf.cast(
          tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))

  return outputs


def _fill_in_missing(x):
  """Replace missing values in a SparseTensor.
  Fills in missing values of `x` with '' or 0, and converts to a dense tensor.
  Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 1
      in the second dimension.
  Returns:
    A rank 1 tensor where missing values of `x` have been filled in.
  """
  if not isinstance(x, tf.sparse.SparseTensor):
    return x

  default_value = '' if x.dtype == tf.string else 0
  return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)
Writing taxi_transform.py

今、私たちは、この機能のエンジニアリング・コードに渡しTransformコンポーネントと、あなたのデータを変換するためにそれを実行します。

transform = tfx.components.Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_gen.outputs['schema'],
    module_file=os.path.abspath(_taxi_transform_module_file))
context.run(transform)
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_transform.py' (including modules: ['taxi_constants', 'taxi_transform']).
INFO:absl:User module package has hash fingerprint version f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmp/tmp6h4enzoj/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmp1kilc09_', '--dist-dir', '/tmp/tmpu7dszvtp']
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,
listing git files failed - pretending there aren't any
INFO:absl:Successfully built user code wheel distribution at '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'; target user module is 'taxi_transform'.
INFO:absl:Full user module path is 'taxi_transform@/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
I1205 10:59:37.233487  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Running executor for Transform
I1205 10:59:37.237077  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Analyze the 'train' split and transform all splits when splits_config is not set.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
INFO:absl:Installing '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp9ljjlr0t', '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_constants.py -> build/lib
copying taxi_transform.py -> build/lib
installing to /tmp/tmp1kilc09_
running install
running install_lib
copying build/lib/taxi_constants.py -> /tmp/tmp1kilc09_
copying build/lib/taxi_transform.py -> /tmp/tmp1kilc09_
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmp/tmp1kilc09_/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3.7.egg-info
running install_scripts
creating /tmp/tmp1kilc09_/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/WHEEL
creating '/tmp/tmpu7dszvtp/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' and adding '/tmp/tmp1kilc09_' to it
adding 'taxi_constants.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/METADATA'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/WHEEL'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/top_level.txt'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/RECORD'
removing /tmp/tmp1kilc09_
Processing /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl', 'stats_options_updater_fn': None} 'stats_options_updater_fn'
INFO:absl:Installing '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp6rcd17nh', '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
Processing /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:Installing '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpbq8i22l2', '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
Processing /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:289: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType], int] instead.
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType], int] instead.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature company has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature fare has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature payment_type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature tips has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_miles has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_seconds has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_day has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_month has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to VarLenSparseTensor.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
2021-12-05 10:59:51.571461: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/transform_graph/5/.temp_path/tftransform_tmp/7fa0435e7af949ef9e3b27e50d470602/assets
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/transform_graph/5/.temp_path/tftransform_tmp/f9ed85f61d1f4528846646b3a922c30c/assets
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized

のがの出力アーティファクト調べてみましょうTransform 。このコンポーネントは、次の2種類の出力を生成します。

  • transform_graph (このグラフは、サービング及び評価モデルに含まれる)前処理操作を行うことができたグラフです。
  • transformed_examples前処理されたトレーニングや評価データを表しています。
transform.outputs
{'transform_graph': Channel(
     type_name: TransformGraph
     artifacts: [Artifact(artifact: id: 5
 type_id: 22
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/transform_graph/5"
 custom_properties {
   key: "name"
   value {
     string_value: "transform_graph"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 22
 name: "TransformGraph"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'transformed_examples': Channel(
     type_name: Examples
     artifacts: [Artifact(artifact: id: 6
 type_id: 14
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/transformed_examples/5"
 properties {
   key: "split_names"
   value {
     string_value: "[\"train\", \"eval\"]"
   }
 }
 custom_properties {
   key: "name"
   value {
     string_value: "transformed_examples"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 14
 name: "Examples"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 properties {
   key: "version"
   value: INT
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'updated_analyzer_cache': Channel(
     type_name: TransformCache
     artifacts: [Artifact(artifact: id: 7
 type_id: 23
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/updated_analyzer_cache/5"
 custom_properties {
   key: "name"
   value {
     string_value: "updated_analyzer_cache"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 23
 name: "TransformCache"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'pre_transform_schema': Channel(
     type_name: Schema
     artifacts: [Artifact(artifact: id: 8
 type_id: 18
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/pre_transform_schema/5"
 custom_properties {
   key: "name"
   value {
     string_value: "pre_transform_schema"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 18
 name: "Schema"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'pre_transform_stats': Channel(
     type_name: ExampleStatistics
     artifacts: [Artifact(artifact: id: 9
 type_id: 16
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/pre_transform_stats/5"
 custom_properties {
   key: "name"
   value {
     string_value: "pre_transform_stats"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 16
 name: "ExampleStatistics"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'post_transform_schema': Channel(
     type_name: Schema
     artifacts: [Artifact(artifact: id: 10
 type_id: 18
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/post_transform_schema/5"
 custom_properties {
   key: "name"
   value {
     string_value: "post_transform_schema"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 18
 name: "Schema"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'post_transform_stats': Channel(
     type_name: ExampleStatistics
     artifacts: [Artifact(artifact: id: 11
 type_id: 16
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/post_transform_stats/5"
 custom_properties {
   key: "name"
   value {
     string_value: "post_transform_stats"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 16
 name: "ExampleStatistics"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'post_transform_anomalies': Channel(
     type_name: ExampleAnomalies
     artifacts: [Artifact(artifact: id: 12
 type_id: 20
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Transform/post_transform_anomalies/5"
 custom_properties {
   key: "name"
   value {
     string_value: "post_transform_anomalies"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Transform"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 20
 name: "ExampleAnomalies"
 properties {
   key: "span"
   value: INT
 }
 properties {
   key: "split_names"
   value: STRING
 }
 )]
     additional_properties: {}
     additional_custom_properties: {}
 )}

覗いてくださいtransform_graphアーティファクトを。これは、3つのサブディレクトリを含むディレクトリを指します。

train_uri = transform.outputs['transform_graph'].get()[0].uri
os.listdir(train_uri)
['transform_fn', 'transformed_metadata', 'metadata']

transformed_metadataサブディレクトリには、前処理データのスキーマが含まれています。 transform_fnサブディレクトリには、実際の前処理グラフが含まれています。 metadataサブディレクトリには、元のデータのスキーマが含まれています。

また、最初の3つの変換された例を見ることができます。

# Get the URI of the output artifact representing the transformed examples, which is a directory
train_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)
features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 8
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 0.06106060370802879
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 1
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: -0.15886740386486053
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: -0.7118487358093262
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 1.2521241903305054
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.532160758972168
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: 0.5509493350982666
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 48
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 0.3873794674873352
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.21955278515815735
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: 0.0019067146349698305
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
}

後にTransformコンポーネントの機能にデータを変換し、次のステップは、モデルを訓練することですしています。

トレーナー

Trainerコンポーネントを使用すると、TensorFlowで定義されていることのモデルを訓練(のいずれかで見積もりのAPIやKerasのAPIを使用しますmodel_to_estimator )。

Trainerからの入力としてスキーマを取りSchemaGen 、変換されたデータとグラフからTransform 、トレーニングパラメータ、ならびにユーザ定義のモデルコードを含むモジュール。

のは、(TensorFlow見積もりのAPIの概要については、以下のユーザ定義のモデルコードの例を見てみましょうチュートリアルを参照してください):

_taxi_trainer_module_file = 'taxi_trainer.py'
%%writefile {_taxi_trainer_module_file}

import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils
from tfx_bsl.tfxio import dataset_options

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_MAX_CATEGORICAL_FEATURE_VALUES = taxi_constants.MAX_CATEGORICAL_FEATURE_VALUES
_LABEL_KEY = taxi_constants.LABEL_KEY


# Tf.Transform considers these features as "raw"
def _get_raw_feature_spec(schema):
  return schema_utils.schema_as_feature_spec(schema).feature_spec


def _build_estimator(config, hidden_units=None, warm_start_from=None):
  """Build an estimator for predicting the tipping behavior of taxi riders.
  Args:
    config: tf.estimator.RunConfig defining the runtime environment for the
      estimator (including model_dir).
    hidden_units: [int], the layer sizes of the DNN (input layer first)
    warm_start_from: Optional directory to warm start from.
  Returns:
    A dict of the following:
      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  real_valued_columns = [
      tf.feature_column.numeric_column(key, shape=())
      for key in _DENSE_FLOAT_FEATURE_KEYS
  ]
  categorical_columns = [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_VOCAB_SIZE + _OOV_SIZE, default_value=0)
      for key in _VOCAB_FEATURE_KEYS
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_FEATURE_BUCKET_COUNT, default_value=0)
      for key in _BUCKET_FEATURE_KEYS
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(  # pylint: disable=g-complex-comprehension
          key,
          num_buckets=num_buckets,
          default_value=0) for key, num_buckets in zip(
              _CATEGORICAL_FEATURE_KEYS,
              _MAX_CATEGORICAL_FEATURE_VALUES)
  ]
  return tf.estimator.DNNLinearCombinedClassifier(
      config=config,
      linear_feature_columns=categorical_columns,
      dnn_feature_columns=real_valued_columns,
      dnn_hidden_units=hidden_units or [100, 70, 50, 25],
      warm_start_from=warm_start_from)


def _example_serving_receiver_fn(tf_transform_graph, schema):
  """Build the serving in inputs.
  Args:
    tf_transform_graph: A TFTransformOutput.
    schema: the schema of the input data.
  Returns:
    Tensorflow graph which parses examples, applying tf-transform to them.
  """
  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_feature_spec.pop(_LABEL_KEY)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec, default_batch_size=None)
  serving_input_receiver = raw_input_fn()

  transformed_features = tf_transform_graph.transform_raw_features(
      serving_input_receiver.features)

  return tf.estimator.export.ServingInputReceiver(
      transformed_features, serving_input_receiver.receiver_tensors)


def _eval_input_receiver_fn(tf_transform_graph, schema):
  """Build everything needed for the tf-model-analysis to run the model.
  Args:
    tf_transform_graph: A TFTransformOutput.
    schema: the schema of the input data.
  Returns:
    EvalInputReceiver function, which contains:
      - Tensorflow graph which parses raw untransformed features, applies the
        tf-transform preprocessing operators.
      - Set of raw, untransformed features.
      - Label against which predictions will be compared.
  """
  # Notice that the inputs are raw features, not transformed features here.
  raw_feature_spec = _get_raw_feature_spec(schema)

  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_tensor')

  # Add a parse_example operator to the tensorflow graph, which will parse
  # raw, untransformed, tf examples.
  features = tf.io.parse_example(serialized_tf_example, raw_feature_spec)

  # Now that we have our raw examples, process them through the tf-transform
  # function computed during the preprocessing step.
  transformed_features = tf_transform_graph.transform_raw_features(
      features)

  # The key name MUST be 'examples'.
  receiver_tensors = {'examples': serialized_tf_example}

  # NOTE: Model is driven by transformed features (since training works on the
  # materialized output of TFT, but slicing will happen on raw features.
  features.update(transformed_features)

  return tfma.export.EvalInputReceiver(
      features=features,
      receiver_tensors=receiver_tensors,
      labels=transformed_features[_LABEL_KEY])


def _input_fn(file_pattern, data_accessor, tf_transform_output, batch_size=200):
  """Generates features and label for tuning/training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,
      dataset_options.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY),
      tf_transform_output.transformed_metadata.schema)


# TFX will call this function
def trainer_fn(trainer_fn_args, schema):
  """Build the estimator using the high level API.
  Args:
    trainer_fn_args: Holds args used to train the model as name/value pairs.
    schema: Holds the schema of the training examples.
  Returns:
    A dict of the following:
      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  # Number of nodes in the first layer of the DNN
  first_dnn_layer_size = 100
  num_dnn_layers = 4
  dnn_decay_factor = 0.7

  train_batch_size = 40
  eval_batch_size = 40

  tf_transform_graph = tft.TFTransformOutput(trainer_fn_args.transform_output)

  train_input_fn = lambda: _input_fn(  # pylint: disable=g-long-lambda
      trainer_fn_args.train_files,
      trainer_fn_args.data_accessor,
      tf_transform_graph,
      batch_size=train_batch_size)

  eval_input_fn = lambda: _input_fn(  # pylint: disable=g-long-lambda
      trainer_fn_args.eval_files,
      trainer_fn_args.data_accessor,
      tf_transform_graph,
      batch_size=eval_batch_size)

  train_spec = tf.estimator.TrainSpec(  # pylint: disable=g-long-lambda
      train_input_fn,
      max_steps=trainer_fn_args.train_steps)

  serving_receiver_fn = lambda: _example_serving_receiver_fn(  # pylint: disable=g-long-lambda
      tf_transform_graph, schema)

  exporter = tf.estimator.FinalExporter('chicago-taxi', serving_receiver_fn)
  eval_spec = tf.estimator.EvalSpec(
      eval_input_fn,
      steps=trainer_fn_args.eval_steps,
      exporters=[exporter],
      name='chicago-taxi-eval')

  run_config = tf.estimator.RunConfig(
      save_checkpoints_steps=999, keep_checkpoint_max=1)

  run_config = run_config.replace(model_dir=trainer_fn_args.serving_model_dir)

  estimator = _build_estimator(
      # Construct layers sizes with exponetial decay
      hidden_units=[
          max(2, int(first_dnn_layer_size * dnn_decay_factor**i))
          for i in range(num_dnn_layers)
      ],
      config=run_config,
      warm_start_from=trainer_fn_args.base_model)

  # Create an input receiver for TFMA processing
  receiver_fn = lambda: _eval_input_receiver_fn(  # pylint: disable=g-long-lambda
      tf_transform_graph, schema)

  return {
      'estimator': estimator,
      'train_spec': train_spec,
      'eval_spec': eval_spec,
      'eval_input_receiver_fn': receiver_fn
  }
Writing taxi_trainer.py

今、私たちは、このモデルのコードに渡しTrainerコンポーネントとモデルを訓練するためにそれを実行します。

from tfx.components.trainer.executor import Executor
from tfx.dsl.components.base import executor_spec

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(_taxi_trainer_module_file),
    custom_executor_spec=executor_spec.ExecutorClassSpec(Executor),
    examples=transform.outputs['transformed_examples'],
    schema=schema_gen.outputs['schema'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=tfx.proto.TrainArgs(num_steps=10000),
    eval_args=tfx.proto.EvalArgs(num_steps=5000))
context.run(trainer)
WARNING:absl:`custom_executor_spec` is deprecated. Please customize component directly.
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py' (including modules: ['taxi_constants', 'taxi_trainer', 'taxi_transform']).
INFO:absl:User module package has hash fingerprint version e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmp/tmpfdfqeq3n/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmplwndr27q', '--dist-dir', '/tmp/tmpm5jkf1c7']
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,
listing git files failed - pretending there aren't any
INFO:absl:Successfully built user code wheel distribution at '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'; target user module is 'taxi_trainer'.
INFO:absl:Full user module path is 'taxi_trainer@/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'
INFO:absl:Running driver for Trainer
INFO:absl:MetadataStore with DB connection initialized
I1205 11:00:05.421522  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Running executor for Trainer
I1205 11:00:05.425110  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
INFO:absl:udf_utils.get_fn {'train_args': '{\n  "num_steps": 10000\n}', 'eval_args': '{\n  "num_steps": 5000\n}', 'module_file': None, 'run_fn': None, 'trainer_fn': None, 'custom_config': 'null', 'module_path': 'taxi_trainer@/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'} 'trainer_fn'
INFO:absl:Installing '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmpudspobnm', '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl']
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_constants.py -> build/lib
copying taxi_trainer.py -> build/lib
copying taxi_transform.py -> build/lib
installing to /tmp/tmplwndr27q
running install
running install_lib
copying build/lib/taxi_constants.py -> /tmp/tmplwndr27q
copying build/lib/taxi_transform.py -> /tmp/tmplwndr27q
copying build/lib/taxi_trainer.py -> /tmp/tmplwndr27q
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmplwndr27q/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3.7.egg-info
running install_scripts
creating /tmp/tmplwndr27q/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/WHEEL
creating '/tmp/tmpm5jkf1c7/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl' and adding '/tmp/tmplwndr27q' to it
adding 'taxi_constants.py'
adding 'taxi_trainer.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/RECORD'
removing /tmp/tmplwndr27q
Processing /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl
INFO:absl:Successfully installed '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'.
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:absl:Training model.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/engine/base_layer_v1.py:1684: UserWarning: `layer.add_variable` is deprecated and will be removed in a future version. Please use `layer.add_weight` method instead.
  warnings.warn('`layer.add_variable` is deprecated and '
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.7078417, step = 0
INFO:tensorflow:global_step/sec: 70.3244
INFO:tensorflow:loss = 0.54806644, step = 100 (1.423 sec)
INFO:tensorflow:global_step/sec: 86.7936
INFO:tensorflow:loss = 0.61360043, step = 200 (1.152 sec)
INFO:tensorflow:global_step/sec: 85.3687
INFO:tensorflow:loss = 0.4860243, step = 300 (1.171 sec)
INFO:tensorflow:global_step/sec: 86.4491
INFO:tensorflow:loss = 0.4932023, step = 400 (1.157 sec)
INFO:tensorflow:global_step/sec: 84.918
INFO:tensorflow:loss = 0.41420126, step = 500 (1.177 sec)
INFO:tensorflow:global_step/sec: 85.5433
INFO:tensorflow:loss = 0.502645, step = 600 (1.169 sec)
INFO:tensorflow:global_step/sec: 85.7348
INFO:tensorflow:loss = 0.5135077, step = 700 (1.166 sec)
INFO:tensorflow:global_step/sec: 85.9959
INFO:tensorflow:loss = 0.50064766, step = 800 (1.163 sec)
INFO:tensorflow:global_step/sec: 84.4301
INFO:tensorflow:loss = 0.5338023, step = 900 (1.185 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 999...
INFO:tensorflow:Saving checkpoints for 999 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/saver.py:971: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 999...
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-12-05T11:00:25
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt-999
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 45.25082s
INFO:tensorflow:Finished evaluation at 2021-12-05-11:01:10
INFO:tensorflow:Saving dict for global step 999: accuracy = 0.77114, accuracy_baseline = 0.77114, auc = 0.92330086, auc_precision_recall = 0.66446304, average_loss = 0.46160534, global_step = 999, label/mean = 0.22886, loss = 0.46160552, precision = 0.0, prediction/mean = 0.24982427, recall = 0.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt-999
INFO:tensorflow:global_step/sec: 2.07624
INFO:tensorflow:loss = 0.5403578, step = 1000 (48.163 sec)
INFO:tensorflow:global_step/sec: 86.3781
INFO:tensorflow:loss = 0.38168782, step = 1100 (1.158 sec)
INFO:tensorflow:global_step/sec: 85.2624
INFO:tensorflow:loss = 0.39346403, step = 1200 (1.173 sec)
INFO:tensorflow:global_step/sec: 83.7912
INFO:tensorflow:loss = 0.40447283, step = 1300 (1.194 sec)
INFO:tensorflow:global_step/sec: 84.0061
INFO:tensorflow:loss = 0.44532022, step = 1400 (1.190 sec)
INFO:tensorflow:global_step/sec: 85.6364
INFO:tensorflow:loss = 0.44722432, step = 1500 (1.169 sec)
INFO:tensorflow:global_step/sec: 86.1981
INFO:tensorflow:loss = 0.38483262, step = 1600 (1.159 sec)
INFO:tensorflow:global_step/sec: 86.8631
INFO:tensorflow:loss = 0.5259759, step = 1700 (1.152 sec)
INFO:tensorflow:global_step/sec: 84.9455
INFO:tensorflow:loss = 0.55505085, step = 1800 (1.177 sec)
INFO:tensorflow:global_step/sec: 86.3588
INFO:tensorflow:loss = 0.38577095, step = 1900 (1.158 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1998...
INFO:tensorflow:Saving checkpoints for 1998 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1998...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 78.271
INFO:tensorflow:loss = 0.5068237, step = 2000 (1.277 sec)
INFO:tensorflow:global_step/sec: 86.0626
INFO:tensorflow:loss = 0.43203792, step = 2100 (1.162 sec)
INFO:tensorflow:global_step/sec: 84.691
INFO:tensorflow:loss = 0.4243142, step = 2200 (1.181 sec)
INFO:tensorflow:global_step/sec: 86.057
INFO:tensorflow:loss = 0.33626375, step = 2300 (1.162 sec)
INFO:tensorflow:global_step/sec: 86.4836
INFO:tensorflow:loss = 0.5215112, step = 2400 (1.156 sec)
INFO:tensorflow:global_step/sec: 86.1571
INFO:tensorflow:loss = 0.3480332, step = 2500 (1.161 sec)
INFO:tensorflow:global_step/sec: 83.5733
INFO:tensorflow:loss = 0.3900601, step = 2600 (1.197 sec)
INFO:tensorflow:global_step/sec: 85.2641
INFO:tensorflow:loss = 0.41936797, step = 2700 (1.174 sec)
INFO:tensorflow:global_step/sec: 84.707
INFO:tensorflow:loss = 0.37252873, step = 2800 (1.179 sec)
INFO:tensorflow:global_step/sec: 84.4798
INFO:tensorflow:loss = 0.38240016, step = 2900 (1.184 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 2997...
INFO:tensorflow:Saving checkpoints for 2997 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 2997...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 75.8418
INFO:tensorflow:loss = 0.2528301, step = 3000 (1.318 sec)
INFO:tensorflow:global_step/sec: 84.156
INFO:tensorflow:loss = 0.4254836, step = 3100 (1.188 sec)
INFO:tensorflow:global_step/sec: 85.0661
INFO:tensorflow:loss = 0.5024188, step = 3200 (1.176 sec)
INFO:tensorflow:global_step/sec: 82.2437
INFO:tensorflow:loss = 0.3909358, step = 3300 (1.216 sec)
INFO:tensorflow:global_step/sec: 82.2637
INFO:tensorflow:loss = 0.328662, step = 3400 (1.216 sec)
INFO:tensorflow:global_step/sec: 84.4683
INFO:tensorflow:loss = 0.36957046, step = 3500 (1.184 sec)
INFO:tensorflow:global_step/sec: 84.4389
INFO:tensorflow:loss = 0.43177825, step = 3600 (1.184 sec)
INFO:tensorflow:global_step/sec: 85.2814
INFO:tensorflow:loss = 0.43844128, step = 3700 (1.173 sec)
INFO:tensorflow:global_step/sec: 83.9934
INFO:tensorflow:loss = 0.3894402, step = 3800 (1.191 sec)
INFO:tensorflow:global_step/sec: 85.6644
INFO:tensorflow:loss = 0.3499531, step = 3900 (1.167 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3996...
INFO:tensorflow:Saving checkpoints for 3996 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3996...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 77.2294
INFO:tensorflow:loss = 0.43472967, step = 4000 (1.294 sec)
INFO:tensorflow:global_step/sec: 86.9355
INFO:tensorflow:loss = 0.31338528, step = 4100 (1.151 sec)
INFO:tensorflow:global_step/sec: 86.7796
INFO:tensorflow:loss = 0.45728058, step = 4200 (1.152 sec)
INFO:tensorflow:global_step/sec: 86.8483
INFO:tensorflow:loss = 0.39699784, step = 4300 (1.151 sec)
INFO:tensorflow:global_step/sec: 87.1248
INFO:tensorflow:loss = 0.43616992, step = 4400 (1.148 sec)
INFO:tensorflow:global_step/sec: 86.8816
INFO:tensorflow:loss = 0.35230064, step = 4500 (1.151 sec)
INFO:tensorflow:global_step/sec: 86.9788
INFO:tensorflow:loss = 0.36814964, step = 4600 (1.150 sec)
INFO:tensorflow:global_step/sec: 86.884
INFO:tensorflow:loss = 0.39265686, step = 4700 (1.151 sec)
INFO:tensorflow:global_step/sec: 86.3142
INFO:tensorflow:loss = 0.3569767, step = 4800 (1.159 sec)
INFO:tensorflow:global_step/sec: 86.7831
INFO:tensorflow:loss = 0.38372093, step = 4900 (1.152 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4995...
INFO:tensorflow:Saving checkpoints for 4995 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4995...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 77.8516
INFO:tensorflow:loss = 0.37753737, step = 5000 (1.284 sec)
INFO:tensorflow:global_step/sec: 86.9472
INFO:tensorflow:loss = 0.39870018, step = 5100 (1.150 sec)
INFO:tensorflow:global_step/sec: 87.6235
INFO:tensorflow:loss = 0.3469496, step = 5200 (1.141 sec)
INFO:tensorflow:global_step/sec: 85.5072
INFO:tensorflow:loss = 0.4431352, step = 5300 (1.169 sec)
INFO:tensorflow:global_step/sec: 86.753
INFO:tensorflow:loss = 0.4120473, step = 5400 (1.153 sec)
INFO:tensorflow:global_step/sec: 87.9292
INFO:tensorflow:loss = 0.41318005, step = 5500 (1.137 sec)
INFO:tensorflow:global_step/sec: 86.9944
INFO:tensorflow:loss = 0.33395153, step = 5600 (1.150 sec)
INFO:tensorflow:global_step/sec: 85.7159
INFO:tensorflow:loss = 0.39095598, step = 5700 (1.167 sec)
INFO:tensorflow:global_step/sec: 86.5248
INFO:tensorflow:loss = 0.3990689, step = 5800 (1.156 sec)
INFO:tensorflow:global_step/sec: 87.7908
INFO:tensorflow:loss = 0.35857546, step = 5900 (1.139 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5994...
INFO:tensorflow:Saving checkpoints for 5994 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5994...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 77.8735
INFO:tensorflow:loss = 0.3701624, step = 6000 (1.284 sec)
INFO:tensorflow:global_step/sec: 87.5076
INFO:tensorflow:loss = 0.41708413, step = 6100 (1.143 sec)
INFO:tensorflow:global_step/sec: 84.466
INFO:tensorflow:loss = 0.29821724, step = 6200 (1.184 sec)
INFO:tensorflow:global_step/sec: 83.526
INFO:tensorflow:loss = 0.35562894, step = 6300 (1.197 sec)
INFO:tensorflow:global_step/sec: 87.5455
INFO:tensorflow:loss = 0.28250116, step = 6400 (1.142 sec)
INFO:tensorflow:global_step/sec: 86.3403
INFO:tensorflow:loss = 0.3280113, step = 6500 (1.158 sec)
INFO:tensorflow:global_step/sec: 87.024
INFO:tensorflow:loss = 0.3482268, step = 6600 (1.149 sec)
INFO:tensorflow:global_step/sec: 85.355
INFO:tensorflow:loss = 0.37907737, step = 6700 (1.172 sec)
INFO:tensorflow:global_step/sec: 84.621
INFO:tensorflow:loss = 0.31550306, step = 6800 (1.182 sec)
INFO:tensorflow:global_step/sec: 83.3363
INFO:tensorflow:loss = 0.3832593, step = 6900 (1.202 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 6993...
INFO:tensorflow:Saving checkpoints for 6993 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 6993...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 74.4455
INFO:tensorflow:loss = 0.41803008, step = 7000 (1.342 sec)
INFO:tensorflow:global_step/sec: 82.7229
INFO:tensorflow:loss = 0.32837537, step = 7100 (1.209 sec)
INFO:tensorflow:global_step/sec: 84.3715
INFO:tensorflow:loss = 0.33435482, step = 7200 (1.185 sec)
INFO:tensorflow:global_step/sec: 84.2735
INFO:tensorflow:loss = 0.26065814, step = 7300 (1.187 sec)
INFO:tensorflow:global_step/sec: 85.663
INFO:tensorflow:loss = 0.41420022, step = 7400 (1.167 sec)
INFO:tensorflow:global_step/sec: 87.0079
INFO:tensorflow:loss = 0.40608707, step = 7500 (1.150 sec)
INFO:tensorflow:global_step/sec: 87.7408
INFO:tensorflow:loss = 0.36437988, step = 7600 (1.140 sec)
INFO:tensorflow:global_step/sec: 87.4937
INFO:tensorflow:loss = 0.39505738, step = 7700 (1.144 sec)
INFO:tensorflow:global_step/sec: 88.4098
INFO:tensorflow:loss = 0.2943158, step = 7800 (1.130 sec)
INFO:tensorflow:global_step/sec: 87.3161
INFO:tensorflow:loss = 0.352277, step = 7900 (1.145 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 7992...
INFO:tensorflow:Saving checkpoints for 7992 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 7992...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 78.0033
INFO:tensorflow:loss = 0.24916664, step = 8000 (1.282 sec)
INFO:tensorflow:global_step/sec: 87.818
INFO:tensorflow:loss = 0.23849675, step = 8100 (1.139 sec)
INFO:tensorflow:global_step/sec: 86.7864
INFO:tensorflow:loss = 0.35711345, step = 8200 (1.152 sec)
INFO:tensorflow:global_step/sec: 87.5709
INFO:tensorflow:loss = 0.3992316, step = 8300 (1.142 sec)
INFO:tensorflow:global_step/sec: 86.4715
INFO:tensorflow:loss = 0.38699418, step = 8400 (1.157 sec)
INFO:tensorflow:global_step/sec: 87.1347
INFO:tensorflow:loss = 0.27517205, step = 8500 (1.147 sec)
INFO:tensorflow:global_step/sec: 87.6778
INFO:tensorflow:loss = 0.3764573, step = 8600 (1.140 sec)
INFO:tensorflow:global_step/sec: 86.488
INFO:tensorflow:loss = 0.38588572, step = 8700 (1.156 sec)
INFO:tensorflow:global_step/sec: 88.0878
INFO:tensorflow:loss = 0.34926754, step = 8800 (1.135 sec)
INFO:tensorflow:global_step/sec: 86.5916
INFO:tensorflow:loss = 0.3552958, step = 8900 (1.155 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 8991...
INFO:tensorflow:Saving checkpoints for 8991 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 8991...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 75.4932
INFO:tensorflow:loss = 0.36349216, step = 9000 (1.325 sec)
INFO:tensorflow:global_step/sec: 83.4161
INFO:tensorflow:loss = 0.35490102, step = 9100 (1.199 sec)
INFO:tensorflow:global_step/sec: 87.0142
INFO:tensorflow:loss = 0.36661166, step = 9200 (1.149 sec)
INFO:tensorflow:global_step/sec: 86.8802
INFO:tensorflow:loss = 0.42985326, step = 9300 (1.151 sec)
INFO:tensorflow:global_step/sec: 87.3449
INFO:tensorflow:loss = 0.47281235, step = 9400 (1.145 sec)
INFO:tensorflow:global_step/sec: 88.3826
INFO:tensorflow:loss = 0.22590041, step = 9500 (1.131 sec)
INFO:tensorflow:global_step/sec: 87.3166
INFO:tensorflow:loss = 0.4162217, step = 9600 (1.145 sec)
INFO:tensorflow:global_step/sec: 87.5265
INFO:tensorflow:loss = 0.37611717, step = 9700 (1.143 sec)
INFO:tensorflow:global_step/sec: 86.1899
INFO:tensorflow:loss = 0.3856167, step = 9800 (1.160 sec)
INFO:tensorflow:global_step/sec: 87.7519
INFO:tensorflow:loss = 0.24105208, step = 9900 (1.140 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 9990...
INFO:tensorflow:Saving checkpoints for 9990 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 9990...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10000...
INFO:tensorflow:Saving checkpoints for 10000 into /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10000...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-12-05T11:02:58
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 43.13040s
INFO:tensorflow:Finished evaluation at 2021-12-05-11:03:41
INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.787805, accuracy_baseline = 0.771235, auc = 0.9339468, auc_precision_recall = 0.70544505, average_loss = 0.3452758, global_step = 10000, label/mean = 0.228765, loss = 0.34527487, precision = 0.69398266, prediction/mean = 0.2301482, recall = 0.12956527
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Performing the final export in the end of training.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:145: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/export/chicago-taxi/temp-1638702221/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/export/chicago-taxi/temp-1638702221/saved_model.pb
INFO:tensorflow:Loss for final step: 0.3770034.
INFO:absl:Training complete. Model written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving. ModelRun written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6
INFO:absl:Exporting eval_savedmodel for TFMA.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-TFMA/temp-1638702224/assets
INFO:tensorflow:SavedModel written to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-TFMA/temp-1638702224/saved_model.pb
INFO:absl:Exported eval_savedmodel to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model_run/6/Format-TFMA.
WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/serving_model_dir/saved_model.pb"
INFO:absl:Serving model copied to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model/6/Format-Serving.
WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/eval_model_dir/saved_model.pb"
INFO:absl:Eval model copied to: /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model/6/Format-TFMA.
INFO:absl:Running publisher for Trainer
INFO:absl:MetadataStore with DB connection initialized

TensorBoardを使用してトレーニングを分析する

オプションで、TensorBoardをTrainerに接続して、モデルのトレーニング曲線を分析できます。

# Get the URI of the output artifact representing the training logs, which is a directory
model_run_dir = trainer.outputs['model_run'].get()[0].uri

%load_ext tensorboard
%tensorboard --logdir {model_run_dir}

評価者

Evaluatorコンポーネントは、評価セットの上にモデルのパフォーマンスメトリックを計算します。これは、使用していますTensorFlowモデル解析ライブラリを。 Evaluatorまた、必要に応じて、新たに訓練されたモデルは、より良い以前のモデルよりもあることを検証することができます。これは、モデルを毎日自動的にトレーニングおよび検証できる本番パイプライン設定で役立ちます。ので、このノートブックでは、我々は唯一、一つのモデルを訓練Evaluator自動的に「良い」などのモデルにラベルを付けます。

Evaluator 、入力としてのデータがかかりますExampleGen 、から訓練を受けたモデルTrainer 、およびスライスの設定を。スライス構成を使用すると、フィーチャ値のメトリックをスライスできます(たとえば、午前8時と午後8時に開始するタクシー旅行でモデルはどのように機能しますか?)。以下のこの構成の例を参照してください。

eval_config = tfma.EvalConfig(
    model_specs=[
        # Using signature 'eval' implies the use of an EvalSavedModel. To use
        # a serving model remove the signature to defaults to 'serving_default'
        # and add a label_key.
        tfma.ModelSpec(signature_name='eval')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            # The metrics added here are in addition to those saved with the
            # model (assuming either a keras model or EvalSavedModel is used).
            # Any metrics added into the saved model (for example using
            # model.compile(..., metrics=[...]), etc) will be computed
            # automatically.
            metrics=[
                tfma.MetricConfig(class_name='ExampleCount')
            ],
            # To add validation thresholds for metrics saved with the model,
            # add them keyed by metric name to the thresholds map.
            thresholds = {
                'accuracy': tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(
                        lower_bound={'value': 0.5}),
                    # Change threshold will be ignored if there is no
                    # baseline model resolved from MLMD (first run).
                    change_threshold=tfma.GenericChangeThreshold(
                       direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                       absolute={'value': -1e-10}))
            }
        )
    ],
    slicing_specs=[
        # An empty slice spec means the overall slice, i.e. the whole dataset.
        tfma.SlicingSpec(),
        # Data can be sliced along a feature column. In this case, data is
        # sliced along feature column trip_start_hour.
        tfma.SlicingSpec(feature_keys=['trip_start_hour'])
    ])

次に、我々は、この設定を与えるEvaluatorと、それを実行します。

# Use TFMA to compute a evaluation statistics over features of a model and
# validate them against a baseline.

# The model resolver is only required if performing model validation in addition
# to evaluation. In this case we validate against the latest blessed model. If
# no model has been blessed before (as in this case) the evaluator will make our
# candidate the first blessed model.
model_resolver = tfx.dsl.Resolver(
      strategy_class=tfx.dsl.experimental.LatestBlessedModelStrategy,
      model=tfx.dsl.Channel(type=tfx.types.standard_artifacts.Model),
      model_blessing=tfx.dsl.Channel(
          type=tfx.types.standard_artifacts.ModelBlessing)).with_id(
              'latest_blessed_model_resolver')
context.run(model_resolver)

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)
INFO:absl:Running driver for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running publisher for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running driver for Evaluator
INFO:absl:MetadataStore with DB connection initialized
I1205 11:03:46.279654  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Running executor for Evaluator
I1205 11:03:46.282887  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Nonempty beam arg extra_packages already includes dependency
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        }\n      ],\n      "thresholds": {\n        "accuracy": {\n          "change_threshold": {\n            "absolute": -1e-10,\n            "direction": "HIGHER_IS_BETTER"\n          },\n          "value_threshold": {\n            "lower_bound": 0.5\n          }\n        }\n      }\n    }\n  ],\n  "model_specs": [\n    {\n      "signature_name": "eval"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': 'null', 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_eval_shared_model'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Using /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model/6/Format-TFMA as  model.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
INFO:absl:The 'example_splits' parameter is not set, using 'eval' split.
INFO:absl:Evaluating model.
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        }\n      ],\n      "thresholds": {\n        "accuracy": {\n          "change_threshold": {\n            "absolute": -1e-10,\n            "direction": "HIGHER_IS_BETTER"\n          },\n          "value_threshold": {\n            "lower_bound": 0.5\n          }\n        }\n      }\n    }\n  ],\n  "model_specs": [\n    {\n      "signature_name": "eval"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': 'null', 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_extractors'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:Restoring parameters from /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model/6/Format-TFMA/variables/variables
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info.
INFO:absl:Evaluation complete. Results written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Evaluator/evaluation/8.
INFO:absl:Checking validation results.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:absl:Blessing result True written to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Evaluator/blessing/8.
INFO:absl:Running publisher for Evaluator
INFO:absl:MetadataStore with DB connection initialized

今度は、の出力アーティファクト調べてみましょうEvaluator

evaluator.outputs
{'evaluation': Channel(
     type_name: ModelEvaluation
     artifacts: [Artifact(artifact: id: 15
 type_id: 29
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Evaluator/evaluation/8"
 custom_properties {
   key: "name"
   value {
     string_value: "evaluation"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Evaluator"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 29
 name: "ModelEvaluation"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 ),
 'blessing': Channel(
     type_name: ModelBlessing
     artifacts: [Artifact(artifact: id: 16
 type_id: 30
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Evaluator/blessing/8"
 custom_properties {
   key: "blessed"
   value {
     int_value: 1
   }
 }
 custom_properties {
   key: "current_model"
   value {
     string_value: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Trainer/model/6"
   }
 }
 custom_properties {
   key: "current_model_id"
   value {
     int_value: 13
   }
 }
 custom_properties {
   key: "name"
   value {
     string_value: "blessing"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Evaluator"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 30
 name: "ModelBlessing"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 )}

使用してevaluation出力することは、我々は全体の評価セットのグローバルメトリックのデフォルトの可視化を表示することができます。

context.show(evaluator.outputs['evaluation'])

スライスされた評価指標の視覚化を確認するには、TensorFlowモデル分析ライブラリを直接呼び出すことができます。

import tensorflow_model_analysis as tfma

# Get the TFMA output result path and load the result.
PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
tfma_result = tfma.load_eval_result(PATH_TO_RESULT)

# Show data sliced along feature column trip_start_hour.
tfma.view.render_slicing_metrics(
    tfma_result, slicing_column='trip_start_hour')
SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'trip_start_hour:19',…

この可視化は、同じメトリックを示しているが、の全ての特徴値で計算trip_start_hourの代わりに、全体の評価セットに。

TensorFlow Model Analysisは、公平性インジケーターやモデルパフォーマンスの時系列のプロットなど、他の多くの視覚化をサポートしています。より多くを学ぶために、参照のチュートリアルを

構成にしきい値を追加したため、検証出力も利用できます。 precence blessingアーティファクトは、我々のモデルが検証に合格したことを示しています。これが実行される最初の検証であるため、候補者は自動的に祝福されます。

blessing_uri = evaluator.outputs['blessing'].get()[0].uri
!ls -l {blessing_uri}
total 0
-rw-rw-r-- 1 kbuilder kbuilder 0 Dec  5 11:03 BLESSED

これで、検証結果レコードをロードして成功を検証することもできます。

PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
print(tfma.load_validation_result(PATH_TO_RESULT))
validation_ok: true
validation_details {
  slicing_details {
    slicing_spec {
    }
    num_matching_slices: 25
  }
}

プッシャー

Pusher成分はTFXパイプラインの終わりに通常です。このモデルが検証に合格し、そうであれば、にモデルをエクスポートしているかどうかをチェックし_serving_model_dir

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
context.run(pusher)
INFO:absl:Running driver for Pusher
INFO:absl:MetadataStore with DB connection initialized
I1205 11:03:54.694877  1805 rdbms_metadata_access_object.cc:686] No property is defined for the Type
INFO:absl:Running executor for Pusher
INFO:absl:Model version: 1638702234
INFO:absl:Model written to serving path /tmp/tmposmo4233/serving_model/taxi_simple/1638702234.
INFO:absl:Model pushed to /tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Pusher/pushed_model/9.
INFO:absl:Running publisher for Pusher
INFO:absl:MetadataStore with DB connection initialized

のがの出力アーティファクト調べてみましょうPusher

pusher.outputs
{'pushed_model': Channel(
     type_name: PushedModel
     artifacts: [Artifact(artifact: id: 17
 type_id: 32
 uri: "/tmp/tfx-interactive-2021-12-05T10_59_24.898354-se36qxc4/Pusher/pushed_model/9"
 custom_properties {
   key: "name"
   value {
     string_value: "pushed_model"
   }
 }
 custom_properties {
   key: "producer_component"
   value {
     string_value: "Pusher"
   }
 }
 custom_properties {
   key: "pushed"
   value {
     int_value: 1
   }
 }
 custom_properties {
   key: "pushed_destination"
   value {
     string_value: "/tmp/tmposmo4233/serving_model/taxi_simple/1638702234"
   }
 }
 custom_properties {
   key: "pushed_version"
   value {
     string_value: "1638702234"
   }
 }
 custom_properties {
   key: "state"
   value {
     string_value: "published"
   }
 }
 custom_properties {
   key: "tfx_version"
   value {
     string_value: "1.4.0"
   }
 }
 state: LIVE
 , artifact_type: id: 32
 name: "PushedModel"
 )]
     additional_properties: {}
     additional_custom_properties: {}
 )}

特に、プッシャーはモデルをSavedModel形式でエクスポートします。これは次のようになります。

push_uri = pusher.outputs['pushed_model'].get()[0].uri
model = tf.saved_model.load(push_uri)

for item in model.signatures.items():
  pp.pprint(item)
('regression', <ConcreteFunction pruned(inputs) at 0x7F19BF0F9510>)
('classification', <ConcreteFunction pruned(inputs) at 0x7F19BE0EC350>)
('serving_default', <ConcreteFunction pruned(inputs) at 0x7F19BC6BE210>)
('predict', <ConcreteFunction pruned(examples) at 0x7F19BC4F9090>)

組み込みのTFXコンポーネントのツアーが終了しました。