簡単なTFXパイプラインを実行するための短いチュートリアル。
このノートブックベースのチュートリアルでは、単純な分類モデルのTFXパイプラインを作成して実行します。パイプラインは、ExampleGen、Trainer、Pusherの3つの重要なTFXコンポーネントで構成されます。パイプラインには、データのインポート、モデルのトレーニング、トレーニングされたモデルのエクスポートなど、最小限のMLワークフローが含まれています。
参照してくださいTFXパイプラインが理解TFXの様々な概念についての詳細を学ぶために。
設定
まず、TFX Pythonパッケージをインストールし、モデルに使用するデータセットをダウンロードする必要があります。
アップグレードピップ
ローカルで実行しているときにシステムでPipをアップグレードしないようにするには、Colabで実行していることを確認してください。もちろん、ローカルシステムは個別にアップグレードできます。
try:
import colab
!pip install --upgrade pip
except:
pass
TFXをインストールする
pip install -U tfx
ランタイムを再起動しましたか?
上記のセルを初めて実行するときにGoogleColabを使用している場合は、[ランタイムの再起動]ボタンをクリックするか、[ランタイム]> [ランタイムの再起動...]メニューを使用してランタイムを再起動する必要があります。これは、Colabがパッケージをロードする方法が原因です。
TensorFlowとTFXのバージョンを確認してください。
import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
TensorFlow version: 2.6.2 TFX version: 1.4.0
変数を設定する
パイプラインを定義するために使用されるいくつかの変数があります。これらの変数は必要に応じてカスタマイズできます。デフォルトでは、パイプラインからのすべての出力は現在のディレクトリの下に生成されます。
import os
PIPELINE_NAME = "penguin-simple"
# Output directory to store artifacts generated from the pipeline.
PIPELINE_ROOT = os.path.join('pipelines', PIPELINE_NAME)
# Path to a SQLite DB file to use as an MLMD storage.
METADATA_PATH = os.path.join('metadata', PIPELINE_NAME, 'metadata.db')
# Output directory where created models from the pipeline will be exported.
SERVING_MODEL_DIR = os.path.join('serving_model', PIPELINE_NAME)
from absl import logging
logging.set_verbosity(logging.INFO) # Set default logging level.
サンプルデータを準備する
TFXパイプラインで使用するサンプルデータセットをダウンロードします。私たちが使用しているデータセットがあるパーマーペンギンデータセットも他で使用されているTFX例。
このデータセットには、次の4つの数値機能があります。
- culmen_length_mm
- culmen_depth_mm
- フリッパー_長さ_mm
- body_mass_g
すべての特徴は、範囲[0,1]を持つようにすでに正規化されています。私たちは、予測、分類モデル構築するspecies
のペンギンのを。
TFX ExampleGenはディレクトリから入力を読み取るため、ディレクトリを作成してデータセットをそこにコピーする必要があります。
import urllib.request
import tempfile
DATA_ROOT = tempfile.mkdtemp(prefix='tfx-data') # Create a temporary directory.
_data_url = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
_data_filepath = os.path.join(DATA_ROOT, "data.csv")
urllib.request.urlretrieve(_data_url, _data_filepath)
('/tmp/tfx-dataijanq9u3/data.csv', <http.client.HTTPMessage at 0x7f487953d110>)
CSVファイルをざっと見てみましょう。
head {_data_filepath}
species,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g 0,0.2545454545454545,0.6666666666666666,0.15254237288135594,0.2916666666666667 0,0.26909090909090905,0.5119047619047618,0.23728813559322035,0.3055555555555556 0,0.29818181818181805,0.5833333333333334,0.3898305084745763,0.1527777777777778 0,0.16727272727272732,0.7380952380952381,0.3559322033898305,0.20833333333333334 0,0.26181818181818167,0.892857142857143,0.3050847457627119,0.2638888888888889 0,0.24727272727272717,0.5595238095238096,0.15254237288135594,0.2569444444444444 0,0.25818181818181823,0.773809523809524,0.3898305084745763,0.5486111111111112 0,0.32727272727272727,0.5357142857142859,0.1694915254237288,0.1388888888888889 0,0.23636363636363636,0.9642857142857142,0.3220338983050847,0.3055555555555556
5つの値が表示されるはずです。 species
0、1または2のいずれか、およびすべての他の特徴は0と1の間の値を有するべきです。
パイプラインを作成する
TFXパイプラインは、PythonAPIを使用して定義されます。次の3つのコンポーネントで構成されるパイプラインを定義します。
- CsvExampleGen:データファイルを読み込み、さらに処理するためにTFX内部形式に変換します。複数ありますExampleGen様々なフォーマットのためのsが。このチュートリアルでは、CSVファイル入力を受け取るCsvExampleGenを使用します。
- トレーナー:MLモデルをトレーニングします。トレーナーコンポーネントは、ユーザーからのモデル定義のコードが必要です。あなたは、モデルを訓練し、_savedモデル形式で保存する方法を指定するTensorFlow APIを使用することができます。
- プッシャー:トレーニング済みモデルをTFXパイプラインの外部にコピーします。プッシャーコンポーネントは、訓練されたMLモデルの展開過程と考えることができます。
実際にパイプラインを定義する前に、まずトレーナーコンポーネントのモデルコードを作成する必要があります。
モデルトレーニングコードを書く
TensorFlow KerasAPIを使用して分類するための単純なDNNモデルを作成します。このモデルトレーニングコードは別のファイルに保存されます。
このチュートリアルでは、使用する一般的なトレーナーKerasベースのモデルをサポートTFXの。あなたは含むPythonのファイル記述する必要がrun_fn
ためのエントリポイントである機能、 Trainer
コンポーネントを。
_trainer_module_file = 'penguin_trainer.py'
%%writefile {_trainer_module_file}
from typing import List
from absl import logging
import tensorflow as tf
from tensorflow import keras
from tensorflow_transform.tf_metadata import schema_utils
from tfx import v1 as tfx
from tfx_bsl.public import tfxio
from tensorflow_metadata.proto.v0 import schema_pb2
_FEATURE_KEYS = [
'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g'
]
_LABEL_KEY = 'species'
_TRAIN_BATCH_SIZE = 20
_EVAL_BATCH_SIZE = 10
# Since we're not generating or creating a schema, we will instead create
# a feature spec. Since there are a fairly small number of features this is
# manageable for this dataset.
_FEATURE_SPEC = {
**{
feature: tf.io.FixedLenFeature(shape=[1], dtype=tf.float32)
for feature in _FEATURE_KEYS
},
_LABEL_KEY: tf.io.FixedLenFeature(shape=[1], dtype=tf.int64)
}
def _input_fn(file_pattern: List[str],
data_accessor: tfx.components.DataAccessor,
schema: schema_pb2.Schema,
batch_size: int = 200) -> tf.data.Dataset:
"""Generates features and label for training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
schema: schema of the input data.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, indices) tuple where features is a
dictionary of Tensors, and indices is a single Tensor of label indices.
"""
return data_accessor.tf_dataset_factory(
file_pattern,
tfxio.TensorFlowDatasetOptions(
batch_size=batch_size, label_key=_LABEL_KEY),
schema=schema).repeat()
def _build_keras_model() -> tf.keras.Model:
"""Creates a DNN Keras model for classifying penguin data.
Returns:
A Keras Model.
"""
# The model below is built with Functional API, please refer to
# https://www.tensorflow.org/guide/keras/overview for all API options.
inputs = [keras.layers.Input(shape=(1,), name=f) for f in _FEATURE_KEYS]
d = keras.layers.concatenate(inputs)
for _ in range(2):
d = keras.layers.Dense(8, activation='relu')(d)
outputs = keras.layers.Dense(3)(d)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(
optimizer=keras.optimizers.Adam(1e-2),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.SparseCategoricalAccuracy()])
model.summary(print_fn=logging.info)
return model
# TFX Trainer will call this function.
def run_fn(fn_args: tfx.components.FnArgs):
"""Train the model based on given args.
Args:
fn_args: Holds args used to train the model as name/value pairs.
"""
# This schema is usually either an output of SchemaGen or a manually-curated
# version provided by pipeline author. A schema can also derived from TFT
# graph if a Transform component is used. In the case when either is missing,
# `schema_from_feature_spec` could be used to generate schema from very simple
# feature_spec, but the schema returned would be very primitive.
schema = schema_utils.schema_from_feature_spec(_FEATURE_SPEC)
train_dataset = _input_fn(
fn_args.train_files,
fn_args.data_accessor,
schema,
batch_size=_TRAIN_BATCH_SIZE)
eval_dataset = _input_fn(
fn_args.eval_files,
fn_args.data_accessor,
schema,
batch_size=_EVAL_BATCH_SIZE)
model = _build_keras_model()
model.fit(
train_dataset,
steps_per_epoch=fn_args.train_steps,
validation_data=eval_dataset,
validation_steps=fn_args.eval_steps)
# The result of the training should be saved in `fn_args.serving_model_dir`
# directory.
model.save(fn_args.serving_model_dir, save_format='tf')
Writing penguin_trainer.py
これで、TFXパイプラインを構築するためのすべての準備手順が完了しました。
パイプライン定義を書く
TFXパイプラインを作成する関数を定義します。 A Pipeline
オブジェクトは、TFXのサポートされているパイプラインのオーケストレーションシステムのいずれかを使用して実行することができTFXパイプラインを表しています。
def _create_pipeline(pipeline_name: str, pipeline_root: str, data_root: str,
module_file: str, serving_model_dir: str,
metadata_path: str) -> tfx.dsl.Pipeline:
"""Creates a three component penguin pipeline with TFX."""
# Brings data into the pipeline.
example_gen = tfx.components.CsvExampleGen(input_base=data_root)
# Uses user-provided Python function that trains a model.
trainer = tfx.components.Trainer(
module_file=module_file,
examples=example_gen.outputs['examples'],
train_args=tfx.proto.TrainArgs(num_steps=100),
eval_args=tfx.proto.EvalArgs(num_steps=5))
# Pushes the model to a filesystem destination.
pusher = tfx.components.Pusher(
model=trainer.outputs['model'],
push_destination=tfx.proto.PushDestination(
filesystem=tfx.proto.PushDestination.Filesystem(
base_directory=serving_model_dir)))
# Following three components will be included in the pipeline.
components = [
example_gen,
trainer,
pusher,
]
return tfx.dsl.Pipeline(
pipeline_name=pipeline_name,
pipeline_root=pipeline_root,
metadata_connection_config=tfx.orchestration.metadata
.sqlite_metadata_connection_config(metadata_path),
components=components)
パイプラインを実行する
TFXは、パイプラインを実行するための複数のオーケストレーターをサポートしています。このチュートリアルでは、使用するLocalDagRunner
ローカル環境にTFX Pythonパッケージおよび実行パイプラインに含まれています。有向非巡回グラフを表すTFXパイプラインを「DAG」と呼ぶことがよくあります。
LocalDagRunner
developemntとデバッグのための高速反復を提供します。 TFXは、KubeflowPipelinesやApacheAirflowなど、本番ユースケースに適した他のオーケストレーターもサポートしています。
参照クラウドAIプラットフォームパイプライン上のTFXやTFXエアフローチュートリアルが他のオーケストレーションシステムについて詳しく学ぶこと。
今、私たちは、作成LocalDagRunner
して渡すPipeline
我々はすでに定義された関数から作成されたオブジェクトを。
パイプラインは直接実行され、MLモデルのトレーニングを含むパイプラインの進行状況のログを確認できます。
tfx.orchestration.LocalDagRunner().run(
_create_pipeline(
pipeline_name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT,
data_root=DATA_ROOT,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
metadata_path=METADATA_PATH))
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/penguin_trainer.py' (including modules: ['penguin_trainer']). INFO:absl:User module package has hash fingerprint version a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc. INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmp/tmp28n_co8j/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmpfb02sbta', '--dist-dir', '/tmp/tmpyu7gi15_'] /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. setuptools.SetuptoolsDeprecationWarning, listing git files failed - pretending there aren't any INFO:absl:Successfully built user code wheel distribution at 'pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl'; target user module is 'penguin_trainer'. INFO:absl:Full user module path is 'penguin_trainer@pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl' INFO:absl:Using deployment config: executor_specs { key: "CsvExampleGen" value { beam_executable_spec { python_executor_spec { class_path: "tfx.components.example_gen.csv_example_gen.executor.Executor" } } } } executor_specs { key: "Pusher" value { python_class_executable_spec { class_path: "tfx.components.pusher.executor.Executor" } } } executor_specs { key: "Trainer" value { python_class_executable_spec { class_path: "tfx.components.trainer.executor.GenericExecutor" } } } custom_driver_specs { key: "CsvExampleGen" value { python_class_executable_spec { class_path: "tfx.components.example_gen.driver.FileBasedDriver" } } } metadata_connection_config { sqlite { filename_uri: "metadata/penguin-simple/metadata.db" connection_mode: READWRITE_OPENCREATE } } INFO:absl:Using connection config: sqlite { filename_uri: "metadata/penguin-simple/metadata.db" connection_mode: READWRITE_OPENCREATE } INFO:absl:Component CsvExampleGen is running. INFO:absl:Running launcher for node_info { type { name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen" } id: "CsvExampleGen" } contexts { contexts { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } contexts { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } contexts { type { name: "node" } name { field_value { string_value: "penguin-simple.CsvExampleGen" } } } } outputs { outputs { key: "examples" value { artifact_spec { type { name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } } } } } } parameters { parameters { key: "input_base" value { field_value { string_value: "/tmp/tfx-dataijanq9u3" } } } parameters { key: "input_config" value { field_value { string_value: "{\n \"splits\": [\n {\n \"name\": \"single_split\",\n \"pattern\": \"*\"\n }\n ]\n}" } } } parameters { key: "output_config" value { field_value { string_value: "{\n \"split_config\": {\n \"splits\": [\n {\n \"hash_buckets\": 2,\n \"name\": \"train\"\n },\n {\n \"hash_buckets\": 1,\n \"name\": \"eval\"\n }\n ]\n }\n}" } } } parameters { key: "output_data_format" value { field_value { int_value: 6 } } } parameters { key: "output_file_format" value { field_value { int_value: 5 } } } } downstream_nodes: "Trainer" execution_options { caching_options { } } INFO:absl:MetadataStore with DB connection initialized running bdist_wheel running build running build_py creating build creating build/lib copying penguin_trainer.py -> build/lib installing to /tmp/tmpfb02sbta running install running install_lib copying build/lib/penguin_trainer.py -> /tmp/tmpfb02sbta running install_egg_info running egg_info creating tfx_user_code_Trainer.egg-info writing tfx_user_code_Trainer.egg-info/PKG-INFO writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt' reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt' writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt' Copying tfx_user_code_Trainer.egg-info to /tmp/tmpfb02sbta/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3.7.egg-info running install_scripts creating /tmp/tmpfb02sbta/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc.dist-info/WHEEL creating '/tmp/tmpyu7gi15_/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl' and adding '/tmp/tmpfb02sbta' to it adding 'penguin_trainer.py' adding 'tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc.dist-info/METADATA' adding 'tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc.dist-info/WHEEL' adding 'tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc.dist-info/top_level.txt' adding 'tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc.dist-info/RECORD' removing /tmp/tmpfb02sbta WARNING: Logging before InitGoogleLogging() is written to STDERR I1205 10:44:07.061197 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type I1205 10:44:07.067816 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type I1205 10:44:07.074599 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type I1205 10:44:07.081624 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type INFO:absl:select span and version = (0, None) INFO:absl:latest span and version = (0, None) INFO:absl:MetadataStore with DB connection initialized INFO:absl:Going to run a new execution 1 I1205 10:44:07.136307 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=1, input_dict={}, output_dict=defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "pipelines/penguin-simple/CsvExampleGen/examples/1" custom_properties { key: "input_fingerprint" value { string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1638701046,sum_checksum:1638701046" } } custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:CsvExampleGen:examples:0" } } custom_properties { key: "span" value { int_value: 0 } } , artifact_type: name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } )]}), exec_properties={'output_config': '{\n "split_config": {\n "splits": [\n {\n "hash_buckets": 2,\n "name": "train"\n },\n {\n "hash_buckets": 1,\n "name": "eval"\n }\n ]\n }\n}', 'input_base': '/tmp/tfx-dataijanq9u3', 'input_config': '{\n "splits": [\n {\n "name": "single_split",\n "pattern": "*"\n }\n ]\n}', 'output_file_format': 5, 'output_data_format': 6, 'span': 0, 'version': None, 'input_fingerprint': 'split:single_split,num_files:1,total_bytes:25648,xor_checksum:1638701046,sum_checksum:1638701046'}, execution_output_uri='pipelines/penguin-simple/CsvExampleGen/.system/executor_execution/1/executor_output.pb', stateful_working_dir='pipelines/penguin-simple/CsvExampleGen/.system/stateful_working_dir/2021-12-05T10:44:06.706974', tmp_dir='pipelines/penguin-simple/CsvExampleGen/.system/executor_execution/1/.temp/', pipeline_node=node_info { type { name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen" } id: "CsvExampleGen" } contexts { contexts { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } contexts { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } contexts { type { name: "node" } name { field_value { string_value: "penguin-simple.CsvExampleGen" } } } } outputs { outputs { key: "examples" value { artifact_spec { type { name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } } } } } } parameters { parameters { key: "input_base" value { field_value { string_value: "/tmp/tfx-dataijanq9u3" } } } parameters { key: "input_config" value { field_value { string_value: "{\n \"splits\": [\n {\n \"name\": \"single_split\",\n \"pattern\": \"*\"\n }\n ]\n}" } } } parameters { key: "output_config" value { field_value { string_value: "{\n \"split_config\": {\n \"splits\": [\n {\n \"hash_buckets\": 2,\n \"name\": \"train\"\n },\n {\n \"hash_buckets\": 1,\n \"name\": \"eval\"\n }\n ]\n }\n}" } } } parameters { key: "output_data_format" value { field_value { int_value: 6 } } } parameters { key: "output_file_format" value { field_value { int_value: 5 } } } } downstream_nodes: "Trainer" execution_options { caching_options { } } , pipeline_info=id: "penguin-simple" , pipeline_run_id='2021-12-05T10:44:06.706974') INFO:absl:Generating examples. WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features. INFO:absl:Processing input csv data /tmp/tfx-dataijanq9u3/* to TFExample. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be. INFO:absl:Examples generated. INFO:absl:Cleaning up stateless execution info. INFO:absl:Execution 1 succeeded. INFO:absl:Cleaning up stateful execution info. INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "pipelines/penguin-simple/CsvExampleGen/examples/1" custom_properties { key: "input_fingerprint" value { string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1638701046,sum_checksum:1638701046" } } custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:CsvExampleGen:examples:0" } } custom_properties { key: "span" value { int_value: 0 } } custom_properties { key: "tfx_version" value { string_value: "1.4.0" } } , artifact_type: name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } )]}) for execution 1 INFO:absl:MetadataStore with DB connection initialized INFO:absl:Component CsvExampleGen is finished. INFO:absl:Component Trainer is running. INFO:absl:Running launcher for node_info { type { name: "tfx.components.trainer.component.Trainer" } id: "Trainer" } contexts { contexts { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } contexts { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } contexts { type { name: "node" } name { field_value { string_value: "penguin-simple.Trainer" } } } } inputs { inputs { key: "examples" value { channels { producer_node_query { id: "CsvExampleGen" } context_queries { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } context_queries { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } context_queries { type { name: "node" } name { field_value { string_value: "penguin-simple.CsvExampleGen" } } } artifact_query { type { name: "Examples" } } output_key: "examples" } min_count: 1 } } } outputs { outputs { key: "model" value { artifact_spec { type { name: "Model" } } } } outputs { key: "model_run" value { artifact_spec { type { name: "ModelRun" } } } } } parameters { parameters { key: "custom_config" value { field_value { string_value: "null" } } } parameters { key: "eval_args" value { field_value { string_value: "{\n \"num_steps\": 5\n}" } } } parameters { key: "module_path" value { field_value { string_value: "penguin_trainer@pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl" } } } parameters { key: "train_args" value { field_value { string_value: "{\n \"num_steps\": 100\n}" } } } } upstream_nodes: "CsvExampleGen" downstream_nodes: "Pusher" execution_options { caching_options { } } INFO:absl:MetadataStore with DB connection initialized INFO:absl:MetadataStore with DB connection initialized I1205 10:44:08.274386 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type INFO:absl:Going to run a new execution 2 INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=2, input_dict={'examples': [Artifact(artifact: id: 1 type_id: 15 uri: "pipelines/penguin-simple/CsvExampleGen/examples/1" properties { key: "split_names" value { string_value: "[\"train\", \"eval\"]" } } custom_properties { key: "file_format" value { string_value: "tfrecords_gzip" } } custom_properties { key: "input_fingerprint" value { string_value: "split:single_split,num_files:1,total_bytes:25648,xor_checksum:1638701046,sum_checksum:1638701046" } } custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:CsvExampleGen:examples:0" } } custom_properties { key: "payload_format" value { string_value: "FORMAT_TF_EXAMPLE" } } custom_properties { key: "span" value { int_value: 0 } } custom_properties { key: "tfx_version" value { string_value: "1.4.0" } } state: LIVE create_time_since_epoch: 1638701048257 last_update_time_since_epoch: 1638701048257 , artifact_type: id: 15 name: "Examples" properties { key: "span" value: INT } properties { key: "split_names" value: STRING } properties { key: "version" value: INT } )]}, output_dict=defaultdict(<class 'list'>, {'model': [Artifact(artifact: uri: "pipelines/penguin-simple/Trainer/model/2" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Trainer:model:0" } } , artifact_type: name: "Model" )], 'model_run': [Artifact(artifact: uri: "pipelines/penguin-simple/Trainer/model_run/2" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Trainer:model_run:0" } } , artifact_type: name: "ModelRun" )]}), exec_properties={'custom_config': 'null', 'module_path': 'penguin_trainer@pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl', 'train_args': '{\n "num_steps": 100\n}', 'eval_args': '{\n "num_steps": 5\n}'}, execution_output_uri='pipelines/penguin-simple/Trainer/.system/executor_execution/2/executor_output.pb', stateful_working_dir='pipelines/penguin-simple/Trainer/.system/stateful_working_dir/2021-12-05T10:44:06.706974', tmp_dir='pipelines/penguin-simple/Trainer/.system/executor_execution/2/.temp/', pipeline_node=node_info { type { name: "tfx.components.trainer.component.Trainer" } id: "Trainer" } contexts { contexts { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } contexts { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } contexts { type { name: "node" } name { field_value { string_value: "penguin-simple.Trainer" } } } } inputs { inputs { key: "examples" value { channels { producer_node_query { id: "CsvExampleGen" } context_queries { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } context_queries { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } context_queries { type { name: "node" } name { field_value { string_value: "penguin-simple.CsvExampleGen" } } } artifact_query { type { name: "Examples" } } output_key: "examples" } min_count: 1 } } } outputs { outputs { key: "model" value { artifact_spec { type { name: "Model" } } } } outputs { key: "model_run" value { artifact_spec { type { name: "ModelRun" } } } } } parameters { parameters { key: "custom_config" value { field_value { string_value: "null" } } } parameters { key: "eval_args" value { field_value { string_value: "{\n \"num_steps\": 5\n}" } } } parameters { key: "module_path" value { field_value { string_value: "penguin_trainer@pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl" } } } parameters { key: "train_args" value { field_value { string_value: "{\n \"num_steps\": 100\n}" } } } } upstream_nodes: "CsvExampleGen" downstream_nodes: "Pusher" execution_options { caching_options { } } , pipeline_info=id: "penguin-simple" , pipeline_run_id='2021-12-05T10:44:06.706974') INFO:absl:Train on the 'train' split when train_args.splits is not set. INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set. INFO:absl:udf_utils.get_fn {'custom_config': 'null', 'module_path': 'penguin_trainer@pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl', 'train_args': '{\n "num_steps": 100\n}', 'eval_args': '{\n "num_steps": 5\n}'} 'run_fn' INFO:absl:Installing 'pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl' to a temporary directory. INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmp/tmp9yk6w_js', 'pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl'] Processing ./pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl INFO:absl:Successfully installed 'pipelines/penguin-simple/_wheels/tfx_user_code_Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc-py3-none-any.whl'. INFO:absl:Training model. INFO:absl:Feature body_mass_g has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_depth_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature flipper_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature species has a shape dim { size: 1 } . Setting to DenseTensor. Installing collected packages: tfx-user-code-Trainer Successfully installed tfx-user-code-Trainer-0.0+a7e2e8dccbb913b74904edeec5549d868a2ea392bcd84fbc1965aba698dce3fc INFO:absl:Feature body_mass_g has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_depth_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature flipper_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature species has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature body_mass_g has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_depth_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature flipper_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature species has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature body_mass_g has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_depth_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature culmen_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature flipper_length_mm has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Feature species has a shape dim { size: 1 } . Setting to DenseTensor. INFO:absl:Model: "model" INFO:absl:__________________________________________________________________________________________________ INFO:absl:Layer (type) Output Shape Param # Connected to INFO:absl:================================================================================================== INFO:absl:culmen_length_mm (InputLayer) [(None, 1)] 0 INFO:absl:__________________________________________________________________________________________________ INFO:absl:culmen_depth_mm (InputLayer) [(None, 1)] 0 INFO:absl:__________________________________________________________________________________________________ INFO:absl:flipper_length_mm (InputLayer) [(None, 1)] 0 INFO:absl:__________________________________________________________________________________________________ INFO:absl:body_mass_g (InputLayer) [(None, 1)] 0 INFO:absl:__________________________________________________________________________________________________ INFO:absl:concatenate (Concatenate) (None, 4) 0 culmen_length_mm[0][0] INFO:absl: culmen_depth_mm[0][0] INFO:absl: flipper_length_mm[0][0] INFO:absl: body_mass_g[0][0] INFO:absl:__________________________________________________________________________________________________ INFO:absl:dense (Dense) (None, 8) 40 concatenate[0][0] INFO:absl:__________________________________________________________________________________________________ INFO:absl:dense_1 (Dense) (None, 8) 72 dense[0][0] INFO:absl:__________________________________________________________________________________________________ INFO:absl:dense_2 (Dense) (None, 3) 27 dense_1[0][0] INFO:absl:================================================================================================== INFO:absl:Total params: 139 INFO:absl:Trainable params: 139 INFO:absl:Non-trainable params: 0 INFO:absl:__________________________________________________________________________________________________ 100/100 [==============================] - 1s 3ms/step - loss: 0.4074 - sparse_categorical_accuracy: 0.8755 - val_loss: 0.0760 - val_sparse_categorical_accuracy: 0.9800 2021-12-05 10:44:13.263941: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. INFO:tensorflow:Assets written to: pipelines/penguin-simple/Trainer/model/2/Format-Serving/assets INFO:tensorflow:Assets written to: pipelines/penguin-simple/Trainer/model/2/Format-Serving/assets INFO:absl:Training complete. Model written to pipelines/penguin-simple/Trainer/model/2/Format-Serving. ModelRun written to pipelines/penguin-simple/Trainer/model_run/2 INFO:absl:Cleaning up stateless execution info. INFO:absl:Execution 2 succeeded. INFO:absl:Cleaning up stateful execution info. INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'model': [Artifact(artifact: uri: "pipelines/penguin-simple/Trainer/model/2" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Trainer:model:0" } } custom_properties { key: "tfx_version" value { string_value: "1.4.0" } } , artifact_type: name: "Model" )], 'model_run': [Artifact(artifact: uri: "pipelines/penguin-simple/Trainer/model_run/2" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Trainer:model_run:0" } } custom_properties { key: "tfx_version" value { string_value: "1.4.0" } } , artifact_type: name: "ModelRun" )]}) for execution 2 INFO:absl:MetadataStore with DB connection initialized INFO:absl:Component Trainer is finished. I1205 10:44:13.795414 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type INFO:absl:Component Pusher is running. I1205 10:44:13.799805 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type INFO:absl:Running launcher for node_info { type { name: "tfx.components.pusher.component.Pusher" } id: "Pusher" } contexts { contexts { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } contexts { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } contexts { type { name: "node" } name { field_value { string_value: "penguin-simple.Pusher" } } } } inputs { inputs { key: "model" value { channels { producer_node_query { id: "Trainer" } context_queries { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } context_queries { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } context_queries { type { name: "node" } name { field_value { string_value: "penguin-simple.Trainer" } } } artifact_query { type { name: "Model" } } output_key: "model" } } } } outputs { outputs { key: "pushed_model" value { artifact_spec { type { name: "PushedModel" } } } } } parameters { parameters { key: "custom_config" value { field_value { string_value: "null" } } } parameters { key: "push_destination" value { field_value { string_value: "{\n \"filesystem\": {\n \"base_directory\": \"serving_model/penguin-simple\"\n }\n}" } } } } upstream_nodes: "Trainer" execution_options { caching_options { } } INFO:absl:MetadataStore with DB connection initialized I1205 10:44:13.821346 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type INFO:absl:MetadataStore with DB connection initialized INFO:absl:Going to run a new execution 3 INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=3, input_dict={'model': [Artifact(artifact: id: 2 type_id: 17 uri: "pipelines/penguin-simple/Trainer/model/2" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Trainer:model:0" } } custom_properties { key: "tfx_version" value { string_value: "1.4.0" } } state: LIVE create_time_since_epoch: 1638701053803 last_update_time_since_epoch: 1638701053803 , artifact_type: id: 17 name: "Model" )]}, output_dict=defaultdict(<class 'list'>, {'pushed_model': [Artifact(artifact: uri: "pipelines/penguin-simple/Pusher/pushed_model/3" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Pusher:pushed_model:0" } } , artifact_type: name: "PushedModel" )]}), exec_properties={'push_destination': '{\n "filesystem": {\n "base_directory": "serving_model/penguin-simple"\n }\n}', 'custom_config': 'null'}, execution_output_uri='pipelines/penguin-simple/Pusher/.system/executor_execution/3/executor_output.pb', stateful_working_dir='pipelines/penguin-simple/Pusher/.system/stateful_working_dir/2021-12-05T10:44:06.706974', tmp_dir='pipelines/penguin-simple/Pusher/.system/executor_execution/3/.temp/', pipeline_node=node_info { type { name: "tfx.components.pusher.component.Pusher" } id: "Pusher" } contexts { contexts { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } contexts { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } contexts { type { name: "node" } name { field_value { string_value: "penguin-simple.Pusher" } } } } inputs { inputs { key: "model" value { channels { producer_node_query { id: "Trainer" } context_queries { type { name: "pipeline" } name { field_value { string_value: "penguin-simple" } } } context_queries { type { name: "pipeline_run" } name { field_value { string_value: "2021-12-05T10:44:06.706974" } } } context_queries { type { name: "node" } name { field_value { string_value: "penguin-simple.Trainer" } } } artifact_query { type { name: "Model" } } output_key: "model" } } } } outputs { outputs { key: "pushed_model" value { artifact_spec { type { name: "PushedModel" } } } } } parameters { parameters { key: "custom_config" value { field_value { string_value: "null" } } } parameters { key: "push_destination" value { field_value { string_value: "{\n \"filesystem\": {\n \"base_directory\": \"serving_model/penguin-simple\"\n }\n}" } } } } upstream_nodes: "Trainer" execution_options { caching_options { } } , pipeline_info=id: "penguin-simple" , pipeline_run_id='2021-12-05T10:44:06.706974') WARNING:absl:Pusher is going to push the model without validation. Consider using Evaluator or InfraValidator in your pipeline. INFO:absl:Model version: 1638701053 INFO:absl:Model written to serving path serving_model/penguin-simple/1638701053. INFO:absl:Model pushed to pipelines/penguin-simple/Pusher/pushed_model/3. INFO:absl:Cleaning up stateless execution info. INFO:absl:Execution 3 succeeded. INFO:absl:Cleaning up stateful execution info. INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'pushed_model': [Artifact(artifact: uri: "pipelines/penguin-simple/Pusher/pushed_model/3" custom_properties { key: "name" value { string_value: "penguin-simple:2021-12-05T10:44:06.706974:Pusher:pushed_model:0" } } custom_properties { key: "tfx_version" value { string_value: "1.4.0" } } , artifact_type: name: "PushedModel" )]}) for execution 3 INFO:absl:MetadataStore with DB connection initialized INFO:absl:Component Pusher is finished. I1205 10:44:13.851651 30480 rdbms_metadata_access_object.cc:686] No property is defined for the Type
「INFO:absl:ComponentPusherが終了しました」と表示されます。パイプラインが正常に終了した場合は、ログの最後に。のでPusher
構成要素は、パイプラインの最後のコンポーネントです。
プッシャーコンポーネントはに訓練されたモデルをプッシュSERVING_MODEL_DIR
あるserving_model/penguin-simple
前の手順で変数を変更しなかった場合は、ディレクトリ。 Colabの左側のパネルにあるファイルブラウザから、または次のコマンドを使用して、結果を確認できます。
# List files in created model directory.
find {SERVING_MODEL_DIR}
serving_model/penguin-simple serving_model/penguin-simple/1638701053 serving_model/penguin-simple/1638701053/keras_metadata.pb serving_model/penguin-simple/1638701053/assets serving_model/penguin-simple/1638701053/variables serving_model/penguin-simple/1638701053/variables/variables.data-00000-of-00001 serving_model/penguin-simple/1638701053/variables/variables.index serving_model/penguin-simple/1638701053/saved_model.pb
次のステップ
あなたは上でより多くのリソースを見つけることができますhttps://www.tensorflow.org/tfx/tutorials
参照してくださいTFXパイプラインが理解TFXの様々な概念についての詳細を学ぶために。