テンプレートを使用してTFXパイプラインを作成する

TensorFlow.orgで表示

GoogleColabで実行

GitHubでソースを表示

ノートブックをダウンロード

序章

この文書では、TFX Pythonパッケージで提供されたテンプレートを使用して拡張TensorFlow（TFX）パイプラインを作成するための指示を提供します。命令の多くはLinuxシェルコマンドであり、AIプラットフォームノートブックインスタンスで実行されます。使用して、これらのコマンドを呼び出す対応するJupyterノートブックコードセル!提供されています。

あなたは使用してパイプラインを構築しますタクシーをセットTripsのシカゴ市が発表しました。このパイプラインをベースラインとして利用して、データセットを使用して独自のパイプラインを構築してみることを強くお勧めします。

ステップ1.環境をセットアップします。

AIプラットフォームパイプラインは、パイプラインを構築するための開発環境と、新しく構築されたパイプラインを実行するためのKubeflowパイプラインクラスターを準備します。

インストールtfxとのpythonパッケージkfp余分な要件を。

import sys
# Use the latest version of pip.
!pip install --upgrade pip
# Install tfx and kfp Python packages.
!pip install --upgrade "tfx[kfp]<2"

TFXのバージョンを確認してみましょう。

python3 -c "from tfx import version ; print('TFX version: {}'.format(version.__version__))"

TFX version: 0.29.0

AIプラットフォームパイプラインでは、TFXは、使用してホストされたKubernetes環境で実行されているKubeflowパイプラインを。

Kubeflowパイプラインを使用するためにいくつかの環境変数を設定しましょう。

まず、GCPプロジェクトIDを取得します。

# Read GCP project id from env.
shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
GOOGLE_CLOUD_PROJECT=shell_output[0]
%env GOOGLE_CLOUD_PROJECT={GOOGLE_CLOUD_PROJECT}
print("GCP project ID:" + GOOGLE_CLOUD_PROJECT)

env: GOOGLE_CLOUD_PROJECT=tf-benchmark-dashboard
GCP project ID:tf-benchmark-dashboard

また、KFPクラスターにアクセスする必要があります。 Google CloudConsoleの[AIPlatform]> [Pipeline]メニューからアクセスできます。 KFPクラスターの「エンドポイント」は、パイプラインダッシュボードのURLから見つけることができます。または、このノートブックを起動した[はじめに]ページのURLから取得することもできます。さんが作成してみましょうENDPOINT環境変数をとKFPクラスタのエンドポイントに設定します。 ENDPOINTには、URLのホスト名部分のみを含める必要があります。例えば、KFPダッシュボードのURLがある場合は<a href="https://1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com/#/start">https://1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com/#/start</a> ENDPOINT値となり、 1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com 。

注：以下のエンドポイントの値を設定しなければなりません**。

# This refers to the KFP cluster endpoint
ENDPOINT='' # Enter your ENDPOINT here.
if not ENDPOINT:
    from absl import logging
    logging.error('Set your ENDPOINT in this cell.')

ERROR:absl:Set your ENDPOINT in this cell.

イメージ名を設定しtfx-pipeline 、現在のGCPプロジェクトの下。

# Docker image name for the pipeline image.
CUSTOM_TFX_IMAGE='gcr.io/' + GOOGLE_CLOUD_PROJECT + '/tfx-pipeline'

そして、それは完了です。パイプラインを作成する準備が整いました。

手順2.事前定義されたテンプレートをプロジェクトディレクトリにコピーします。

このステップでは、事前定義されたテンプレートから追加のファイルをコピーして、作業パイプラインプロジェクトディレクトリとファイルを作成します。

あなたは、変更することによって、あなたのパイプラインの別の名前を付けることがPIPELINE_NAME下回ります。これは、ファイルが配置されるプロジェクトディレクトリの名前にもなります。

PIPELINE_NAME="my_pipeline"
import os
PROJECT_DIR=os.path.join(os.path.expanduser("~"),"imported",PIPELINE_NAME)

TFXは、 taxi TFXのPythonパッケージでテンプレートを。分類や回帰など、ポイントごとの予測問題を解決することを計画している場合は、このテンプレートを開始点として使用できます。

tfx template copy CLIコマンドのコピーがプロジェクトディレクトリにテンプレートファイルをあらかじめ定義します。

!tfx template copy \
  --pipeline-name={PIPELINE_NAME} \
  --destination-path={PROJECT_DIR} \
  --model=taxi

CLI
Copying taxi pipeline template
kubeflow_runner.py -> /home/kbuilder/imported/my_pipeline/kubeflow_runner.py
kubeflow_v2_dag_runner.py -> /home/kbuilder/imported/my_pipeline/kubeflow_v2_dag_runner.py
features_test.py -> /home/kbuilder/imported/my_pipeline/models/features_test.py
model_test.py -> /home/kbuilder/imported/my_pipeline/models/estimator/model_test.py
constants.py -> /home/kbuilder/imported/my_pipeline/models/estimator/constants.py
model.py -> /home/kbuilder/imported/my_pipeline/models/estimator/model.py
__init__.py -> /home/kbuilder/imported/my_pipeline/models/estimator/__init__.py
model_test.py -> /home/kbuilder/imported/my_pipeline/models/keras/model_test.py
constants.py -> /home/kbuilder/imported/my_pipeline/models/keras/constants.py
model.py -> /home/kbuilder/imported/my_pipeline/models/keras/model.py
__init__.py -> /home/kbuilder/imported/my_pipeline/models/keras/__init__.py
preprocessing_test.py -> /home/kbuilder/imported/my_pipeline/models/preprocessing_test.py
preprocessing.py -> /home/kbuilder/imported/my_pipeline/models/preprocessing.py
__init__.py -> /home/kbuilder/imported/my_pipeline/models/__init__.py
features.py -> /home/kbuilder/imported/my_pipeline/models/features.py
pipeline.py -> /home/kbuilder/imported/my_pipeline/pipeline/pipeline.py
configs.py -> /home/kbuilder/imported/my_pipeline/pipeline/configs.py
__init__.py -> /home/kbuilder/imported/my_pipeline/pipeline/__init__.py
local_runner.py -> /home/kbuilder/imported/my_pipeline/local_runner.py
model_analysis.ipynb -> /home/kbuilder/imported/my_pipeline/model_analysis.ipynb
__init__.py -> /home/kbuilder/imported/my_pipeline/__init__.py
data_validation.ipynb -> /home/kbuilder/imported/my_pipeline/data_validation.ipynb
.gitignore -> /home/kbuilder/imported/my_pipeline/.gitignore
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/bin/tfx", line 8, in <module>
    sys.exit(cli_group())
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/decorators.py", line 73, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx/tools/cli/commands/template.py", line 73, in copy
    template_handler.copy_template(ctx.flags_dict)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx/tools/cli/handler/template_handler.py", line 185, in copy_template
    fileio.copy(src_path, dst_path)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx/dsl/io/fileio.py", line 51, in copy
    src_fs.copy(src, dst, overwrite=overwrite)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tfx/dsl/io/plugins/tensorflow_gfile.py", line 48, in copy
    tf.io.gfile.copy(src, dst, overwrite=overwrite)
  File "/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 516, in copy_v2
    compat.path_to_bytes(src), compat.path_to_bytes(dst), overwrite)
tensorflow.python.framework.errors_impl.AlreadyExistsError: file already exists

このノートブックの作業ディレクトリコンテキストをプロジェクトディレクトリに変更します。

%cd {PROJECT_DIR}

/home/kbuilder/imported/my_pipeline

注：DOはでディレクトリ変更することを忘れないでFile Browserは、作成された後、プロジェクトディレクトリにクリックして左に。

ステップ3.コピーしたソースファイルを参照する

TFXテンプレートは、Pythonソースコード、サンプルデータ、パイプラインの出力を分析するJupyter Notebookなど、パイプラインを構築するための基本的なスキャフォールドファイルを提供します。 taxiテンプレートは、同じシカゴのタクシーデータセットおよびMLモデルを使用してエアフローチュートリアル。

ここでは、各Pythonファイルの簡単な紹介をします。

pipeline -このディレクトリには、パイプラインの定義が含まれています
- configs.py -パイプラインランナーのための共通の定数を定義しています
- pipeline.py -定義TFXコンポーネントおよびパイプライン
models -このディレクトリには、MLモデルの定義が含まれています。
- features.py 、 features_test.py -モデルの定義機能
- preprocessing.py 、 preprocessing_test.py -使用してジョブを前処理定義tf::Transform
- estimator -このディレクトリには、見積もりベースのモデルが含まれています。
  - constants.py -モデルの定義定数
  - model.py 、 model_test.py - TF推定器を用いてDNNモデルを定義
- keras -このディレクトリにはKerasベースのモデルが含まれています。
  - constants.py -モデルの定義定数
  - model.py 、 model_test.py - Kerasを使用してDNNモデルを定義します
local_runner.py 、 kubeflow_runner.py -各オーケストレーションエンジン用のランナを定義

あなたは、といくつかのファイルがあることに気づくかもしれません_test.py自分の名前には。これらはパイプラインの単体テストであり、独自のパイプラインを実装するときに単体テストを追加することをお勧めします。あなたはとテストファイルのモジュール名を供給することにより、ユニットテストを実行することができ-mフラグ。あなたは通常、削除することにより、モジュール名を取得することができます.py拡張子を交換し、 /で. 。例えば：

{sys.executable} -m models.features_test
{sys.executable} -m models.keras.model_test

Running tests under Python 3.7.5: /tmpfs/src/tf_docs_env/bin/python
[ RUN      ] FeaturesTest.testNumberOfBucketFeatureBucketCount
INFO:tensorflow:time(__main__.FeaturesTest.testNumberOfBucketFeatureBucketCount): 0.0s
I1204 11:33:54.064224 139808961349440 test_util.py:2076] time(__main__.FeaturesTest.testNumberOfBucketFeatureBucketCount): 0.0s
[       OK ] FeaturesTest.testNumberOfBucketFeatureBucketCount
[ RUN      ] FeaturesTest.testTransformedNames
INFO:tensorflow:time(__main__.FeaturesTest.testTransformedNames): 0.0s
I1204 11:33:54.064666 139808961349440 test_util.py:2076] time(__main__.FeaturesTest.testTransformedNames): 0.0s
[       OK ] FeaturesTest.testTransformedNames
[ RUN      ] FeaturesTest.test_session
[  SKIPPED ] FeaturesTest.test_session
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK (skipped=1)
Running tests under Python 3.7.5: /tmpfs/src/tf_docs_env/bin/python
[ RUN      ] ModelTest.testBuildKerasModel
2021-12-04 11:33:57.507456: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 11:33:57.508566: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I1204 11:33:57.581331 139740839778112 layer_utils.py:191] Model: "model"
I1204 11:33:57.581501 139740839778112 layer_utils.py:192] __________________________________________________________________________________________________
I1204 11:33:57.581558 139740839778112 layer_utils.py:189] Layer (type)                    Output Shape         Param #     Connected to                     
I1204 11:33:57.581596 139740839778112 layer_utils.py:194] ==================================================================================================
I1204 11:33:57.581741 139740839778112 layer_utils.py:189] pickup_latitude_xf (InputLayer) [(None,)]            0                                            
I1204 11:33:57.581793 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.581883 139740839778112 layer_utils.py:189] trip_miles_xf (InputLayer)      [(None,)]            0                                            
I1204 11:33:57.581926 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.582010 139740839778112 layer_utils.py:189] trip_start_hour_xf (InputLayer) [(None,)]            0                                            
I1204 11:33:57.582052 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.582189 139740839778112 layer_utils.py:189] dense_features (DenseFeatures)  (None, 1)            0           pickup_latitude_xf[0][0]         
I1204 11:33:57.582241 139740839778112 layer_utils.py:189]                                                                  trip_miles_xf[0][0]              
I1204 11:33:57.582280 139740839778112 layer_utils.py:189]                                                                  trip_start_hour_xf[0][0]         
I1204 11:33:57.582315 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.582462 139740839778112 layer_utils.py:189] dense (Dense)                   (None, 1)            2           dense_features[0][0]             
I1204 11:33:57.582518 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.582629 139740839778112 layer_utils.py:189] dense_1 (Dense)                 (None, 1)            2           dense[0][0]                      
I1204 11:33:57.582674 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.582824 139740839778112 layer_utils.py:189] dense_features_1 (DenseFeatures (None, 34)           0           pickup_latitude_xf[0][0]         
I1204 11:33:57.582879 139740839778112 layer_utils.py:189]                                                                  trip_miles_xf[0][0]              
I1204 11:33:57.582921 139740839778112 layer_utils.py:189]                                                                  trip_start_hour_xf[0][0]         
I1204 11:33:57.582957 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.583053 139740839778112 layer_utils.py:189] concatenate (Concatenate)       (None, 35)           0           dense_1[0][0]                    
I1204 11:33:57.583099 139740839778112 layer_utils.py:189]                                                                  dense_features_1[0][0]           
I1204 11:33:57.583143 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.583260 139740839778112 layer_utils.py:189] dense_2 (Dense)                 (None, 1)            36          concatenate[0][0]                
I1204 11:33:57.583309 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.583389 139740839778112 layer_utils.py:189] tf.compat.v1.squeeze (TFOpLambd (None,)              0           dense_2[0][0]                    
I1204 11:33:57.583432 139740839778112 layer_utils.py:256] ==================================================================================================
I1204 11:33:57.583687 139740839778112 layer_utils.py:267] Total params: 40
I1204 11:33:57.583751 139740839778112 layer_utils.py:268] Trainable params: 40
I1204 11:33:57.583794 139740839778112 layer_utils.py:269] Non-trainable params: 0
I1204 11:33:57.583832 139740839778112 layer_utils.py:270] __________________________________________________________________________________________________
I1204 11:33:57.649701 139740839778112 layer_utils.py:191] Model: "model_1"
I1204 11:33:57.649825 139740839778112 layer_utils.py:192] __________________________________________________________________________________________________
I1204 11:33:57.649878 139740839778112 layer_utils.py:189] Layer (type)                    Output Shape         Param #     Connected to                     
I1204 11:33:57.649932 139740839778112 layer_utils.py:194] ==================================================================================================
I1204 11:33:57.650066 139740839778112 layer_utils.py:189] pickup_latitude_xf (InputLayer) [(None,)]            0                                            
I1204 11:33:57.650120 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.650207 139740839778112 layer_utils.py:189] trip_miles_xf (InputLayer)      [(None,)]            0                                            
I1204 11:33:57.650259 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.650356 139740839778112 layer_utils.py:189] trip_start_hour_xf (InputLayer) [(None,)]            0                                            
I1204 11:33:57.650398 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.650552 139740839778112 layer_utils.py:189] dense_features_2 (DenseFeatures (None, 1)            0           pickup_latitude_xf[0][0]         
I1204 11:33:57.650603 139740839778112 layer_utils.py:189]                                                                  trip_miles_xf[0][0]              
I1204 11:33:57.650644 139740839778112 layer_utils.py:189]                                                                  trip_start_hour_xf[0][0]         
I1204 11:33:57.650682 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.650812 139740839778112 layer_utils.py:189] dense_3 (Dense)                 (None, 1)            2           dense_features_2[0][0]           
I1204 11:33:57.650864 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.651007 139740839778112 layer_utils.py:189] dense_features_3 (DenseFeatures (None, 34)           0           pickup_latitude_xf[0][0]         
I1204 11:33:57.651061 139740839778112 layer_utils.py:189]                                                                  trip_miles_xf[0][0]              
I1204 11:33:57.651102 139740839778112 layer_utils.py:189]                                                                  trip_start_hour_xf[0][0]         
I1204 11:33:57.651146 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.651229 139740839778112 layer_utils.py:189] concatenate_1 (Concatenate)     (None, 35)           0           dense_3[0][0]                    
I1204 11:33:57.651274 139740839778112 layer_utils.py:189]                                                                  dense_features_3[0][0]           
I1204 11:33:57.651311 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.651462 139740839778112 layer_utils.py:189] dense_4 (Dense)                 (None, 1)            36          concatenate_1[0][0]              
I1204 11:33:57.651547 139740839778112 layer_utils.py:258] __________________________________________________________________________________________________
I1204 11:33:57.651632 139740839778112 layer_utils.py:189] tf.compat.v1.squeeze_1 (TFOpLam (None,)              0           dense_4[0][0]                    
I1204 11:33:57.651675 139740839778112 layer_utils.py:256] ==================================================================================================
I1204 11:33:57.651959 139740839778112 layer_utils.py:267] Total params: 38
I1204 11:33:57.652019 139740839778112 layer_utils.py:268] Trainable params: 38
I1204 11:33:57.652061 139740839778112 layer_utils.py:269] Non-trainable params: 0
I1204 11:33:57.652098 139740839778112 layer_utils.py:270] __________________________________________________________________________________________________
INFO:tensorflow:time(__main__.ModelTest.testBuildKerasModel): 0.84s
I1204 11:33:57.652639 139740839778112 test_util.py:2076] time(__main__.ModelTest.testBuildKerasModel): 0.84s
[       OK ] ModelTest.testBuildKerasModel
[ RUN      ] ModelTest.test_session
[  SKIPPED ] ModelTest.test_session
----------------------------------------------------------------------
Ran 2 tests in 0.836s

OK (skipped=1)

ステップ4.最初のTFXパイプラインを実行します

TFXパイプライン内のコンポーネントは、各実行のための出力を生成しますMLメタデータのアーティファクト、彼らはどこかに保存する必要があります。 KFPクラスターがアクセスできる任意のストレージを使用できます。この例では、Google Cloud Storage（GCS）を使用します。デフォルトのGCSバケットが自動的に作成されているはずです。その名は次のようになります<your-project-id>-kubeflowpipelines-default 。

後でパイプラインで使用できるように、サンプルデータをGCSバケットにアップロードしましょう。

gsutil cp data/data.csv gs://{GOOGLE_CLOUD_PROJECT}-kubeflowpipelines-default/tfx-template/data/taxi/data.csv

BucketNotFoundException: 404 gs://tf-benchmark-dashboard-kubeflowpipelines-default bucket does not exist.

さんが使用してTFXパイプラインを作成してみましょうtfx pipeline createコマンドを使用します。

注：KFPのためのパイプラインを作成するとき、私たちは私たちのパイプラインを実行するために使用されるコンテナの画像を必要としています。そしてskaffold私たちのためにイメージを構築します。 skaffoldはDockerハブからベースイメージをプルするため、最初にイメージをビルドするときは5〜10分かかりますが、2回目のビルドからははるかに短い時間で済みます。

!tfx pipeline create  --pipeline-path=kubeflow_runner.py --endpoint={ENDPOINT} \
--build-image

CLI
Usage: tfx pipeline create [OPTIONS]
Try 'tfx pipeline create --help' for help.

Error: no such option: --build-image

パイプラインを作成している間、 Dockerfileドッカーイメージを構築するために生成されます。他のソースファイルと一緒にソース管理システム（たとえば、git）に追加することを忘れないでください。

今使用して、新しく作成されたパイプラインで実行実行を開始tfx run create指示します。

tfx run create --pipeline-name={PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
Pipeline "my_pipeline" does not exist.

または、KFPダッシュボードでパイプラインを実行することもできます。新しい実行の実行は、KFPダッシュボードの[実験]の下に一覧表示されます。実験をクリックすると、進行状況を監視し、実行中に作成されたアーティファクトを視覚化できます。

ただし、KFPダッシュボードにアクセスすることをお勧めします。 KFPダッシュボードには、Google CloudConsoleのCloudAIPlatformパイプラインメニューからアクセスできます。ダッシュボードにアクセスすると、パイプラインを見つけて、パイプラインに関する豊富な情報にアクセスできるようになります。たとえば、実験メニューの下に実行を見つけることができる、そしてあなたが実験の下で、あなたの実行の実行を開いたとき、あなたはアーティファクトのメニューの下にパイプラインからすべての成果物を見つけることができます。

注：お使いのパイプラインの実行に失敗した場合は、KFPダッシュボードでの実験タブ内の各TFXコンポーネントの詳細なログを見ることができます。

失敗の主な原因の1つは、許可関連の問題です。 KFPクラスタにGoogleCloudAPIにアクセスするための権限があることを確認してください。これは、設定することができますが、GCPにKFPクラスタを作成するとき、または参照GCPにトラブルシューティングの文書を。

ステップ5.データ検証用のコンポーネントを追加します。

このステップでは、などのデータ検証のためのコンポーネントを追加しますStatisticsGen 、 SchemaGen 、およびExampleValidator 。あなたはデータの検証に興味がある場合は、以下を参照してくださいTensorflowデータの検証を使ってみましょう。

変更ディレクトリにダブルクリックしてpipelineとオープンに再びダブルクリックpipeline.py 。検索および追加3行コメントを解除StatisticsGen 、 SchemaGen 、およびExampleValidatorパイプラインにします。（ヒント：含むコメントを検索TODO(step 5):保存することを確認しpipeline.pyあなたはそれを編集した後。

次に、パイプライン定義を変更して既存のパイプラインを更新する必要があります。使用tfx pipeline updateに続いて、あなたのパイプライン、更新するためのコマンドをtfx run create更新し、パイプラインの新しい実行実行を作成するコマンドを。

# Update the pipeline
!tfx pipeline update \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT}
# You can run the pipeline the same way.
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Updating pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
beam runner not found in dsl.
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
Pipeline "my_pipeline" does not exist.

パイプライン出力を確認する

KFPダッシュボードにアクセスして、パイプライン実行のページでパイプライン出力を見つけます。左側の実験]タブをクリックし、すべての実験ページで実行されます。パイプラインの名前で最新の実行を見つけることができるはずです。

ステップ6.トレーニング用のコンポーネントを追加します。

このステップでは、などのトレーニングおよびモデル検証のためにコンポーネントを追加しますTransform 、 Trainer 、 Resolver 、 Evaluator 、およびPusher 。

ダブルクリックして開きますpipeline.py 。検索と追加5行コメントを解除Transform 、 Trainer 、 Resolver 、 Evaluator及びPusherパイプラインにします。（ヒント：検索のためのTODO(step 6):

以前と同様に、変更されたパイプライン定義で既存のパイプラインを更新する必要があります。命令が使用してパイプラインステップ5アップデートと同じですtfx pipeline updateし、使用して実行し、実行作成tfx run create 。

!tfx pipeline update \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Updating pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
beam runner not found in dsl.
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
Pipeline "my_pipeline" does not exist.

この実行の実行が正常に終了すると、AIプラットフォームパイプラインで最初のTFXパイプラインが作成され、実行されます。

ステップ7（オプション）BigQueryExampleGenをお試しください

BigQueryは、サーバーレス、高度にスケーラブル、かつ費用対効果の高いクラウドデータウェアハウスです。 BigQueryは、TFXのトレーニング例のソースとして使用できます。この手順では、追加されますBigQueryExampleGenパイプラインに。

ダブルクリックして開きますpipeline.py 。コメントアウトCsvExampleGenとのコメントを外しのインスタンス作成する行BigQueryExampleGen 。また、コメントを解除する必要があるqueryの引数create_pipeline機能を。

私たちは、BigQueryのに使用するGCPプロジェクトを指定する必要があり、これは設定することによって行われる--projectでbeam_pipeline_argsパイプラインを作成するとき。

ダブルクリックして開きますconfigs.py 。コメントを解除の定義GOOGLE_CLOUD_REGION 、 BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGSとBIG_QUERY_QUERY 。このファイルのリージョン値を、GCPプロジェクトの正しい値に置き換える必要があります。

注：あなたには、あなたのGCPの地域を設定しなければなりませんconfigs.pyに進む前にファイル**。

ディレクトリを1レベル上に変更します。ファイルリストの上にあるディレクトリの名前をクリックします。ディレクトリの名前は、パイプラインの名前ですmy_pipelineあなたが変更しなかった場合。

ダブルクリックして開きますkubeflow_runner.py 。コメントを外して二つの引数、 queryおよびbeam_pipeline_argsため、 create_pipeline機能。

これで、パイプラインはサンプルソースとしてBigQueryを使用する準備が整いました。前と同じようにパイプラインを更新し、手順5と6で行ったように新しい実行実行を作成します。

!tfx pipeline update \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Updating pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
beam runner not found in dsl.
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
Pipeline "my_pipeline" does not exist.

ステップ8.（オプション）KFPでデータフローを試してみてください

いくつかのTFXのコンポーネントには、Apacheビームを用いるデータ並列パイプラインを実装するために、それはあなたが使用してデータ処理のワークロードを分散することができることを意味しGoogleクラウドデータフローを。このステップでは、ApacheBeamのデータ処理バックエンドとしてデータフローを使用するようにKubeflowオーケストレーターを設定します。

ダブルクリックしてpipelineの変更ディレクトリに、そしてダブルクリックして開きますconfigs.py 。コメントを解除の定義GOOGLE_CLOUD_REGION 、およびDATAFLOW_BEAM_PIPELINE_ARGS 。

ディレクトリを1レベル上に変更します。ファイルリストの上にあるディレクトリの名前をクリックします。ディレクトリの名前は、パイプラインの名前ですmy_pipelineあなたが変更しなかった場合。

ダブルクリックして開きますkubeflow_runner.py 。コメントを解除beam_pipeline_args 。（また、現在コメントアウトしてくださいbeam_pipeline_args手順7で追加したことを）

これで、パイプラインでDataflowを使用する準備が整いました。手順5と6で行ったように、パイプラインを更新し、実行実行を作成します。

!tfx pipeline update \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Updating pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
beam runner not found in dsl.
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
Pipeline "my_pipeline" does not exist.

あなたには、あなたのデータフローの仕事を見つけることができるクラウドコンソールでのデータフロー。

ステップ9（オプション）KFPとクラウドAIプラットフォームトレーニングと予測してみてください

以下のようないくつかのマネージドGCPサービス、とのTFXの相互運用トレーニングと予測のためのクラウドプラットフォームAI 。あなたは設定することができTrainerクラウドAIプラットフォームのトレーニング、MLモデルを訓練するためのマネージドサービスを使用するコンポーネントを。お使いのモデルが提供されるように構築され、準備ができたときにまた、あなたは、サービス提供のためのクラウドプラットフォームAI予測にモデルをプッシュすることができます。このステップでは、我々は我々の設定しますTrainerとPusherクラウドAI Platformサービスを使用するコンポーネントを。

ファイルを編集する前に、まずAIプラットフォームトレーニング＆予測APIを有効にする必要があります。

ダブルクリックしてpipelineの変更ディレクトリに、そしてダブルクリックして開きますconfigs.py 。コメントを解除の定義GOOGLE_CLOUD_REGION 、 GCP_AI_PLATFORM_TRAINING_ARGSとGCP_AI_PLATFORM_SERVING_ARGS 。我々が設定する必要がありますように、我々は、クラウドAIプラットフォームのトレーニングでモデルを訓練するために私たちのカスタム構築されたコンテナの画像を使用しますmasterConfig.imageUriでGCP_AI_PLATFORM_TRAINING_ARGS同じ値にCUSTOM_TFX_IMAGE以上。

ディレクトリを変更する1つのレベルアップし、ダブルクリックして開きますkubeflow_runner.py 。コメントを解除ai_platform_training_argsとai_platform_serving_args 。

手順5と6で行ったように、パイプラインを更新し、実行実行を作成します。

!tfx pipeline update \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Updating pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
beam runner not found in dsl.
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
[WARNING] Default engine will be changed to "local" in the near future.
Use --engine flag if you intend to use a different orchestrator.
Pipeline "my_pipeline" does not exist.

あなたには、あなたのトレーニングの仕事を見つけることができるクラウドAIプラットフォームジョブズ。あなたのパイプラインが正常に完了した場合、あなたはあなたの中のモデルを見つけることができるクラウドAIプラットフォームモデル。

ステップ10.データをパイプラインに取り込む

ChicagoTaxiデータセットを使用してモデルのパイプラインを作成しました。次に、データをパイプラインに入れます。

データは、GCSやBigQueryなど、パイプラインがアクセスできる場所であればどこにでも保存できます。データにアクセスするには、パイプライン定義を変更する必要があります。

あなたのデータがファイルに保存されている場合は、変更DATA_PATH中kubeflow_runner.pyかlocal_runner.pyと、ファイルの場所に設定します。あなたのデータがBigQueryの中に格納されている場合、修正BIG_QUERY_QUERY中でpipeline/configs.py正しくクエリデータのために。
機能の追加models/features.py 。
修正models/preprocessing.pyするための訓練のための入力データを変換します。
修正models/keras/model.pyとmodels/keras/constants.pyするあなたのMLのモデルを記述。
- 推定量ベースのモデルを使用することもできます。変更RUN_FNに一定のmodels.estimator.model.run_fnでpipeline/configs.py 。

参照してくださいトレーナーコンポーネントガイドより多くの導入のために。

清掃

このプロジェクトで使用されるすべてのGoogleクラウドリソースをクリーンアップするには、次のことができGoogleクラウドプロジェクト削除チュートリアルで使用します。

または、各コンソールにアクセスして、個々のリソースをクリーンアップすることもできます。