Create a TFX pipeline using templates

Introduction

This document will provide instructions to create a TensorFlow Extended (TFX) pipeline using templates which are provided with TFX Python package. Many of the instructions are Linux shell commands, which will run on an AI Platform Notebooks instance. Corresponding Jupyter Notebook code cells which invoke those commands using ! are provided.

You will build a pipeline using Taxi Trips dataset released by the City of Chicago. We strongly encourage you to try building your own pipeline using your dataset by utilizing this pipeline as a baseline.

Step 1. Set up your environment.

AI Platform Pipelines will prepare a development environment to build a pipeline, and a Kubeflow Pipeline cluster to run the newly built pipeline.

"ERROR: some-package 0.some_version.1 has requirement other-package!=2.0.,<3,>=1.15, but you'll have other-package 2.0.0 which is incompatible." Please ignore these errors at this moment.

Install tfx, kfp, and skaffold, and add installation path to the PATH environment variable.

# Install tfx and kfp Python packages.
import sys
!{sys.executable} -m pip install -q --user --upgrade -q tfx==0.22.0
!{sys.executable} -m pip install -q --user --upgrade -q kfp==0.5.1
# Download skaffold and set it executable.
!curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 && chmod +x skaffold && mv skaffold /home/jupyter/.local/bin/
ERROR: tensorflow-serving-api 2.2.0 has requirement tensorflow~=2.2.0, but you'll have tensorflow 2.3.0 which is incompatible.
ERROR: tensorflow-transform 0.22.0 has requirement tensorflow!=2.0.*,<2.3,>=1.15, but you'll have tensorflow 2.3.0 which is incompatible.
ERROR: tensorflow-model-analysis 0.22.2 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.
ERROR: google-cloud-bigquery 1.24.0 has requirement google-resumable-media<0.6dev,>=0.5.0, but you'll have google-resumable-media 0.7.0 which is incompatible.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 44.6M  100 44.6M    0     0  29.0M      0  0:00:01  0:00:01 --:--:-- 29.0M
mv: cannot move 'skaffold' to '/home/jupyter/.local/bin/': No such file or directory

# Set `PATH` to include user python binary directory and a directory containing `skaffold`.
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin
env: PATH=/tmpfs/src/tf_docs_env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/puppetlabs/bin:/opt/android-studio/current/bin:/usr/local/go/bin:/usr/local/go/packages/bin:/opt/kubernetes/client/bin/:/home/kbuilder/.local/bin:/home/jupyter/.local/bin

Let's check the versions of TFX.

python3 -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"
TFX version: 0.22.0

In AI Platform Pipelines, TFX is running in a hosted Kubernetes environment using Kubeflow Pipelines.

Let's set some environment variables to use Kubeflow Pipelines.

First, get your GCP project ID.

# Read GCP project id from env.
shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
GOOGLE_CLOUD_PROJECT=shell_output[0]
%env GOOGLE_CLOUD_PROJECT={GOOGLE_CLOUD_PROJECT}
print("GCP project ID:" + GOOGLE_CLOUD_PROJECT)
env: GOOGLE_CLOUD_PROJECT=tf-benchmark-dashboard
GCP project ID:tf-benchmark-dashboard

We also need to access your KFP cluster. You can access it in your Google Cloud Console under "AI Platform > Pipeline" menu. The "endpoint" of the KFP cluster can be found from the URL of the Pipelines dashboard, or you can get it from the URL of the Getting Started page where you launched this notebook. Let's create an ENDPOINT environment variable and set it to the KFP cluster endpoint. ENDPOINT should contain only the hostname part of the URL. For example, if the URL of the KFP dashboard is <a href="https://1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com/#/start">https://1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com/#/start</a>, ENDPOINT value becomes 1e9deb537390ca22-dot-asia-east1.pipelines.googleusercontent.com.

# This refers to the KFP cluster endpoint
ENDPOINT='' # Enter your ENDPOINT here.
if not ENDPOINT:
    from absl import logging
    logging.error('Set your ENDPOINT in this cell.')
ERROR:absl:Set your ENDPOINT in this cell.

Set the image name as tfx-pipeline under the current GCP project.

# Docker image name for the pipeline image.
CUSTOM_TFX_IMAGE='gcr.io/' + GOOGLE_CLOUD_PROJECT + '/tfx-pipeline'

And, it's done. We are ready to create a pipeline.

Step 2. Copy the predefined template to your project directory.

In this step, we will create a working pipeline project directory and files by copying additional files from a predefined template.

You may give your pipeline a different name by changing the PIPELINE_NAME below. This will also become the name of the project directory where your files will be put.

PIPELINE_NAME="my_pipeline"
import os
PROJECT_DIR=os.path.join(os.path.expanduser("~"),"imported",PIPELINE_NAME)

TFX includes the taxi template with the TFX python package. If you are planning to solve a point-wise prediction problem, including classification and regresssion, this template could be used as a starting point.

The tfx template copy CLI command copies predefined template files into your project directory.

!tfx template copy \
  --pipeline-name={PIPELINE_NAME} \
  --destination-path={PROJECT_DIR} \
  --model=taxi
2020-07-29 09:08:41.674148: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Copying taxi pipeline template
model_test.py -> /home/kbuilder/imported/my_pipeline/models/keras/model_test.py
constants.py -> /home/kbuilder/imported/my_pipeline/models/keras/constants.py
model.py -> /home/kbuilder/imported/my_pipeline/models/keras/model.py
__init__.py -> /home/kbuilder/imported/my_pipeline/models/keras/__init__.py
__init__.py -> /home/kbuilder/imported/my_pipeline/models/__init__.py
preprocessing.py -> /home/kbuilder/imported/my_pipeline/models/preprocessing.py
features_test.py -> /home/kbuilder/imported/my_pipeline/models/features_test.py
features.py -> /home/kbuilder/imported/my_pipeline/models/features.py
preprocessing_test.py -> /home/kbuilder/imported/my_pipeline/models/preprocessing_test.py
model_test.py -> /home/kbuilder/imported/my_pipeline/models/estimator/model_test.py
constants.py -> /home/kbuilder/imported/my_pipeline/models/estimator/constants.py
model.py -> /home/kbuilder/imported/my_pipeline/models/estimator/model.py
__init__.py -> /home/kbuilder/imported/my_pipeline/models/estimator/__init__.py
beam_dag_runner.py -> /home/kbuilder/imported/my_pipeline/beam_dag_runner.py
model_analysis.ipynb -> /home/kbuilder/imported/my_pipeline/model_analysis.ipynb
kubeflow_dag_runner.py -> /home/kbuilder/imported/my_pipeline/kubeflow_dag_runner.py
__init__.py -> /home/kbuilder/imported/my_pipeline/__init__.py
.gitignore -> /home/kbuilder/imported/my_pipeline/.gitignore
__init__.py -> /home/kbuilder/imported/my_pipeline/pipeline/__init__.py
configs.py -> /home/kbuilder/imported/my_pipeline/pipeline/configs.py
pipeline.py -> /home/kbuilder/imported/my_pipeline/pipeline/pipeline.py
data_validation.ipynb -> /home/kbuilder/imported/my_pipeline/data_validation.ipynb

Change the working directory context in this notebook to the project directory.

%cd {PROJECT_DIR}
/home/kbuilder/imported/my_pipeline

Step 3. Browse your copied source files

The TFX template provides basic scaffold files to build a pipeline, including Python source code, sample data, and Jupyter Notebooks to analyse the output of the pipeline. The taxi template uses the same Chicago Taxi dataset and ML model as the Airflow Tutorial.

Here is brief introduction to each of the Python files.

  • pipeline - This directory contains the definition of the pipeline
    • configs.py — defines common constants for pipeline runners
    • pipeline.py — defines TFX components and a pipeline
  • models - This directory contains ML model definitions.
    • features.py, features_test.py — defines features for the model
    • preprocessing.py, preprocessing_test.py — defines preprocessing jobs using tf::Transform
    • estimator - This directory contains an Estimator based model.
      • constants.py — defines constants of the model
      • model.py, model_test.py — defines DNN model using TF estimator
    • keras - This directory contains a Keras based model.
      • constants.py — defines constants of the model
      • model.py, model_test.py — defines DNN model using Keras
  • beam_dag_runner.py, kubeflow_dag_runner.py — define runners for each orchestration engine

You might notice that there are some files with _test.py in their name. These are unit tests of the pipeline and it is recommended to add more unit tests as you implement your own pipelines. You can run unit tests by supplying the module name of test files with -m flag. You can usually get a module name by deleting .py extension and replacing / with .. For example:

!{sys.executable} -m models.features_test
!{sys.executable} -m models.keras.model_test

2020-07-29 09:08:47.430254: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Running tests under Python 3.6.9: /tmpfs/src/tf_docs_env/bin/python
[ RUN      ] FeaturesTest.testNumberOfBucketFeatureBucketCount
INFO:tensorflow:time(__main__.FeaturesTest.testNumberOfBucketFeatureBucketCount): 0.0s
I0729 09:08:49.165422 139892421818176 test_util.py:1973] time(__main__.FeaturesTest.testNumberOfBucketFeatureBucketCount): 0.0s
[       OK ] FeaturesTest.testNumberOfBucketFeatureBucketCount
[ RUN      ] FeaturesTest.testTransformedNames
INFO:tensorflow:time(__main__.FeaturesTest.testTransformedNames): 0.0s
I0729 09:08:49.165793 139892421818176 test_util.py:1973] time(__main__.FeaturesTest.testTransformedNames): 0.0s
[       OK ] FeaturesTest.testTransformedNames
[ RUN      ] FeaturesTest.test_session
[  SKIPPED ] FeaturesTest.test_session
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK (skipped=1)
2020-07-29 09:08:50.125246: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Running tests under Python 3.6.9: /tmpfs/src/tf_docs_env/bin/python
[ RUN      ] ModelTest.testBuildKerasModel
2020-07-29 09:08:52.077661: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-07-29 09:08:53.266870: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:53.267592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2020-07-29 09:08:53.267640: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-29 09:08:53.846879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-29 09:08:54.706307: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-29 09:08:55.075002: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-29 09:08:56.173373: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-29 09:08:56.901865: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-29 09:08:59.529137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-29 09:08:59.529374: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:59.530234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:59.530885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-29 09:08:59.531482: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-29 09:08:59.539457: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2000189999 Hz
2020-07-29 09:08:59.539911: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x62a6d00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-29 09:08:59.539944: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-29 09:08:59.630782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:59.631592: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x62a2240 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-29 09:08:59.631623: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-07-29 09:08:59.631856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:59.632528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2020-07-29 09:08:59.632568: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-29 09:08:59.632604: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-07-29 09:08:59.632621: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-07-29 09:08:59.632632: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-07-29 09:08:59.632643: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-07-29 09:08:59.632658: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-07-29 09:08:59.632673: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-07-29 09:08:59.632740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:59.633405: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:08:59.634107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-07-29 09:08:59.634174: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-29 09:09:00.185333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-29 09:09:00.185396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-07-29 09:09:00.185406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-07-29 09:09:00.185689: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:09:00.186443: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-29 09:09:00.187119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14764 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
I0729 09:09:00.576693 140437734639424 layer_utils.py:192] Model: "functional_1"
I0729 09:09:00.576987 140437734639424 layer_utils.py:193] __________________________________________________________________________________________________
I0729 09:09:00.577068 140437734639424 layer_utils.py:190] Layer (type)                    Output Shape         Param #     Connected to                     
I0729 09:09:00.577120 140437734639424 layer_utils.py:195] ==================================================================================================
I0729 09:09:00.577343 140437734639424 layer_utils.py:190] pickup_latitude_xf (InputLayer) [(None,)]            0                                            
I0729 09:09:00.577421 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.577533 140437734639424 layer_utils.py:190] trip_miles_xf (InputLayer)      [(None,)]            0                                            
I0729 09:09:00.577586 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.577707 140437734639424 layer_utils.py:190] trip_start_hour_xf (InputLayer) [(None,)]            0                                            
I0729 09:09:00.577761 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.577929 140437734639424 layer_utils.py:190] dense_features (DenseFeatures)  (None, 1)            0           pickup_latitude_xf[0][0]         
I0729 09:09:00.577991 140437734639424 layer_utils.py:190]                                                                  trip_miles_xf[0][0]              
I0729 09:09:00.578038 140437734639424 layer_utils.py:190]                                                                  trip_start_hour_xf[0][0]         
I0729 09:09:00.578088 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.578301 140437734639424 layer_utils.py:190] dense (Dense)                   (None, 1)            2           dense_features[0][0]             
I0729 09:09:00.578368 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.578515 140437734639424 layer_utils.py:190] dense_1 (Dense)                 (None, 1)            2           dense[0][0]                      
I0729 09:09:00.578572 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.578728 140437734639424 layer_utils.py:190] dense_features_1 (DenseFeatures (None, 34)           0           pickup_latitude_xf[0][0]         
I0729 09:09:00.578786 140437734639424 layer_utils.py:190]                                                                  trip_miles_xf[0][0]              
I0729 09:09:00.578833 140437734639424 layer_utils.py:190]                                                                  trip_start_hour_xf[0][0]         
I0729 09:09:00.578875 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.578973 140437734639424 layer_utils.py:190] concatenate (Concatenate)       (None, 35)           0           dense_1[0][0]                    
I0729 09:09:00.579027 140437734639424 layer_utils.py:190]                                                                  dense_features_1[0][0]           
I0729 09:09:00.579076 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.579215 140437734639424 layer_utils.py:190] dense_2 (Dense)                 (None, 1)            36          concatenate[0][0]                
I0729 09:09:00.579270 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.579433 140437734639424 layer_utils.py:190] tf_op_layer_Squeeze (TensorFlow [(None,)]            0           dense_2[0][0]                    
I0729 09:09:00.579490 140437734639424 layer_utils.py:257] ==================================================================================================
I0729 09:09:00.579855 140437734639424 layer_utils.py:268] Total params: 40
I0729 09:09:00.579930 140437734639424 layer_utils.py:269] Trainable params: 40
I0729 09:09:00.579980 140437734639424 layer_utils.py:270] Non-trainable params: 0
I0729 09:09:00.580026 140437734639424 layer_utils.py:271] __________________________________________________________________________________________________
I0729 09:09:00.658672 140437734639424 layer_utils.py:192] Model: "functional_3"
I0729 09:09:00.658847 140437734639424 layer_utils.py:193] __________________________________________________________________________________________________
I0729 09:09:00.658916 140437734639424 layer_utils.py:190] Layer (type)                    Output Shape         Param #     Connected to                     
I0729 09:09:00.658966 140437734639424 layer_utils.py:195] ==================================================================================================
I0729 09:09:00.659144 140437734639424 layer_utils.py:190] pickup_latitude_xf (InputLayer) [(None,)]            0                                            
I0729 09:09:00.659209 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.659319 140437734639424 layer_utils.py:190] trip_miles_xf (InputLayer)      [(None,)]            0                                            
I0729 09:09:00.659380 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.659493 140437734639424 layer_utils.py:190] trip_start_hour_xf (InputLayer) [(None,)]            0                                            
I0729 09:09:00.659545 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.659716 140437734639424 layer_utils.py:190] dense_features_2 (DenseFeatures (None, 1)            0           pickup_latitude_xf[0][0]         
I0729 09:09:00.659779 140437734639424 layer_utils.py:190]                                                                  trip_miles_xf[0][0]              
I0729 09:09:00.659827 140437734639424 layer_utils.py:190]                                                                  trip_start_hour_xf[0][0]         
I0729 09:09:00.659871 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.660064 140437734639424 layer_utils.py:190] dense_3 (Dense)                 (None, 1)            2           dense_features_2[0][0]           
I0729 09:09:00.660133 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.660295 140437734639424 layer_utils.py:190] dense_features_3 (DenseFeatures (None, 34)           0           pickup_latitude_xf[0][0]         
I0729 09:09:00.660356 140437734639424 layer_utils.py:190]                                                                  trip_miles_xf[0][0]              
I0729 09:09:00.660404 140437734639424 layer_utils.py:190]                                                                  trip_start_hour_xf[0][0]         
I0729 09:09:00.660448 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.660551 140437734639424 layer_utils.py:190] concatenate_1 (Concatenate)     (None, 35)           0           dense_3[0][0]                    
I0729 09:09:00.660606 140437734639424 layer_utils.py:190]                                                                  dense_features_3[0][0]           
I0729 09:09:00.660650 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.660792 140437734639424 layer_utils.py:190] dense_4 (Dense)                 (None, 1)            36          concatenate_1[0][0]              
I0729 09:09:00.660849 140437734639424 layer_utils.py:259] __________________________________________________________________________________________________
I0729 09:09:00.661057 140437734639424 layer_utils.py:190] tf_op_layer_Squeeze_1 (TensorFl [(None,)]            0           dense_4[0][0]                    
I0729 09:09:00.661201 140437734639424 layer_utils.py:257] ==================================================================================================
I0729 09:09:00.661764 140437734639424 layer_utils.py:268] Total params: 38
I0729 09:09:00.661890 140437734639424 layer_utils.py:269] Trainable params: 38
I0729 09:09:00.661987 140437734639424 layer_utils.py:270] Non-trainable params: 0
I0729 09:09:00.662079 140437734639424 layer_utils.py:271] __________________________________________________________________________________________________
INFO:tensorflow:time(__main__.ModelTest.testBuildKerasModel): 8.75s
I0729 09:09:00.662612 140437734639424 test_util.py:1973] time(__main__.ModelTest.testBuildKerasModel): 8.75s
[       OK ] ModelTest.testBuildKerasModel
[ RUN      ] ModelTest.test_session
[  SKIPPED ] ModelTest.test_session
----------------------------------------------------------------------
Ran 2 tests in 8.753s

OK (skipped=1)

Step 4. Run your first TFX pipeline

Components in the TFX pipeline will generate outputs for each run as ML Metadata Artifacts, and they need to be stored somewhere. You can use any storage which the KFP cluster can access, and for this example we will use Google Cloud Storage (GCS). A default GCS bucket should have been created automatically. Its name will be <your-project-id>-kubeflowpipelines-default.

Let's upload our sample data to GCS bucket so that we can use it in our pipeline later.

gsutil cp data/data.csv gs://{GOOGLE_CLOUD_PROJECT}-kubeflowpipelines-default/tfx-template/data/data.csv
BucketNotFoundException: 404 gs://tf-benchmark-dashboard-kubeflowpipelines-default bucket does not exist.

Let's create a TFX pipeline using the tfx pipeline create command.

!tfx pipeline create  \
--pipeline-path=kubeflow_dag_runner.py \
--endpoint={ENDPOINT} \
--build-target-image={CUSTOM_TFX_IMAGE}
2020-07-29 09:09:04.861459: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating pipeline
Detected Beam.
beam runner not found in dsl.

While creating a pipeline, Dockerfile and build.yaml will be generated to build a Docker image. Don't forget to add these files to the source control system (for example, git) along with other source files.

A pipeline definition file for argo will be generated, too. The name of this file is ${PIPELINE_NAME}.tar.gz. For example, it will be my_pipeline.tar.gz if the name of your pipeline is my_pipeline. It is recommended NOT to include this pipeline definition file into source control, because it will be generated from other Python files and will be updated whenever you update the pipeline. For your convenience, this file is already listed in .gitignore which is generated automatically.

Now start an execution run with the newly created pipeline using the tfx run create command.

tfx run create --pipeline-name={PIPELINE_NAME} --endpoint={ENDPOINT}
2020-07-29 09:09:10.235192: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
Pipeline "my_pipeline" does not exist.

Or, you can also run the pipeline in the KFP Dashboard. The new execution run will be listed under Experiments in the KFP Dashboard. Clicking into the experiment will allow you to monitor progress and visualize the artifacts created during the execution run.

However, we recommend visiting the KFP Dashboard. You can access the KFP Dashboard from the Cloud AI Platform Pipelines menu in Google Cloud Console. Once you visit the dashboard, you will be able to find the pipeline, and access a wealth of information about the pipeline. For example, you can find your runs under the Experiments menu, and when you open your execution run under Experiments you can find all your artifacts from the pipeline under Artifacts menu.

One of the major sources of failure is permission related problems. Please make sure your KFP cluster has permissions to access Google Cloud APIs. This can be configured when you create a KFP cluster in GCP, or see Troubleshooting document in GCP.

Step 5. Add components for data validation.

In this step, you will add components for data validation including StatisticsGen, SchemaGen, and ExampleValidator. If you are interested in data validation, please see Get started with Tensorflow Data Validation.

Double-click to change directory to pipeline and double-click again to open pipeline.py. Find and uncomment the 3 lines which add StatisticsGen, SchemaGen, and ExampleValidator to the pipeline. (Tip: search for comments containing TODO(step 5):). Make sure to save pipeline.py after you edit it.

You now need to update the existing pipeline with modified pipeline definition. Use the tfx pipeline update command to update your pipeline, followed by the tfx run create command to create a new execution run of your updated pipeline.

# Update the pipeline
!tfx pipeline update \
--pipeline-path=kubeflow_dag_runner.py \
--endpoint={ENDPOINT}
# You can run the pipeline the same way.
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}
2020-07-29 09:09:15.563130: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Updating pipeline
Detected Beam.
beam runner not found in dsl.
2020-07-29 09:09:20.805251: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
Pipeline "my_pipeline" does not exist.

Check pipeline outputs

Visit the KFP dashboard to find pipeline outputs in the page for your pipeline run. Click the Experiments tab on the left, and All runs in the Experiments page. You should be able to find the latest run under the name of your pipeline.

Step 6. Add components for training.

In this step, you will add components for training and model validation including Transform, Trainer, 'ResolverNode', Evaluator, and Pusher.

Double-click to open pipeline.py. Find and uncomment the 5 lines which add Transform, Trainer, 'ResolverNode', Evaluator and Pusher to the pipeline. (Tip: search for TODO(step 6):)

As you did before, you now need to update the existing pipeline with the modified pipeline definition. The instructions are the same as Step 5. Update the pipeline using tfx pipeline update, and create an execution run using tfx run create.

!tfx pipeline update \
--pipeline-path=kubeflow_dag_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}
2020-07-29 09:09:25.959691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Updating pipeline
Detected Beam.
beam runner not found in dsl.
2020-07-29 09:09:31.126766: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
Pipeline "my_pipeline" does not exist.

When this execution run finishes successfully, you have now created and run your first TFX pipeline in AI Platform Pipelines!

Step 7. (Optional) Try BigQueryExampleGen

BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse. BigQuery can be used as a source for training examples in TFX. In this step, we will add BigQueryExampleGen to the pipeline.

Double-click to open pipeline.py. Comment out CsvExampleGen and uncomment the line which creates an instance of BigQueryExampleGen. You also need to uncomment the query argument of the create_pipeline function.

We need to specify which GCP project to use for BigQuery, and this is done by setting --project in beam_pipeline_args when creating a pipeline.

Double-click to open configs.py. Uncomment the definition of GOOGLE_CLOUD_REGION, BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS and BIG_QUERY_QUERY. You should replace the region value in this file with the correct values for your GCP project.

Change directory one level up. Click the name of the directory above the file list. The name of the directory is the name of the pipeline which is my_pipeline if you didn't change.

Double-click to open kubeflow_dag_runner.py. Uncomment two arguments, query and beam_pipeline_args, for the create_pipeline function.

Now the pipeline is ready to use BigQuery as an example source. Update the pipeline as before and create a new execution run as we did in step 5 and 6.

!tfx pipeline update \
--pipeline-path=kubeflow_dag_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}
2020-07-29 09:09:36.184395: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Updating pipeline
Detected Beam.
beam runner not found in dsl.
2020-07-29 09:09:41.212155: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
Pipeline "my_pipeline" does not exist.

Step 8. (Optional) Try Dataflow with KFP

Several TFX Components uses Apache Beam to implement data-parallel pipelines, and it means that you can distribute data processing workloads using Google Cloud Dataflow. In this step, we will set the Kubeflow orchestrator to use dataflow as the data processing back-end for Apache Beam.

Double-click pipeline to change directory, and double-click to open configs.py. Uncomment the definition of GOOGLE_CLOUD_REGION, and DATAFLOW_BEAM_PIPELINE_ARGS.

Double-click to open pipeline.py. Change the value of enable_cache to False.

Change directory one level up. Click the name of the directory above the file list. The name of the directory is the name of the pipeline which is my_pipeline if you didn't change.

Double-click to open kubeflow_dag_runner.py. Uncomment beam_pipeline_args. (Also make sure to comment out current beam_pipeline_args that you added in Step 7.)

Note that we deliberately disabled caching. Because we have already run the pipeline successfully, we will get cached execution result for all components if cache is enabled.

Now the pipeline is ready to use Dataflow. Update the pipeline and create an execution run as we did in step 5 and 6.

!tfx pipeline update \
--pipeline-path=kubeflow_dag_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}
2020-07-29 09:09:46.288762: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Updating pipeline
Detected Beam.
beam runner not found in dsl.
2020-07-29 09:09:51.384825: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
Pipeline "my_pipeline" does not exist.

You can find your Dataflow jobs in Dataflow in Cloud Console.

Please reset enable_cache to True to benefit from caching execution results.

Double-click to open pipeline.py. Reset the value of enable_cache to True.

Step 9. (Optional) Try Cloud AI Platform Training and Prediction with KFP

TFX interoperates with several managed GCP services, such as Cloud AI Platform for Training and Prediction. You can set your Trainer component to use Cloud AI Platform Training, a managed service for training ML models. Moreover, when your model is built and ready to be served, you can push your model to Cloud AI Platform Prediction for serving. In this step, we will set our Trainer and Pusher component to use Cloud AI Platform services.

Before editing files, you might first have to enable AI Platform Training & Prediction API.

Double-click pipeline to change directory, and double-click to open configs.py. Uncomment the definition of GOOGLE_CLOUD_REGION, GCP_AI_PLATFORM_TRAINING_ARGS and GCP_AI_PLATFORM_SERVING_ARGS. We will use our custom built container image to train a model in Cloud AI Platform Training, so we should set masterConfig.imageUri in GCP_AI_PLATFORM_TRAINING_ARGS to the same value as CUSTOM_TFX_IMAGE above.

Change directory one level up, and double-click to open kubeflow_dag_runner.py. Uncomment ai_platform_training_args and ai_platform_serving_args.

Update the pipeline and create an execution run as we did in step 5 and 6.

!tfx pipeline update \
--pipeline-path=kubeflow_dag_runner.py \
--endpoint={ENDPOINT}
!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}
2020-07-29 09:09:56.394051: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Updating pipeline
Detected Beam.
beam runner not found in dsl.
2020-07-29 09:10:01.455915: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
CLI
Creating a run for pipeline: my_pipeline
Detected Beam.
Pipeline "my_pipeline" does not exist.

You can find your training jobs in Cloud AI Platform Jobs. If your pipeline completed successfully, you can find your model in Cloud AI Platform Models.

Step 10. Ingest YOUR data to the pipeline

We made a pipeline for a model using the Chicago Taxi dataset. Now it's time to put your data into the pipeline.

Your data can be stored anywhere your pipeline can access, including GCS, or BigQuery. You will need to modify the pipeline definition to access your data.

  1. If your data is stored in files, modify the DATA_PATH in kubeflow_dag_runner.py or beam_dag_runner.py and set it to the location of your files. If your data is stored in BigQuery, modify BIG_QUERY_QUERY in pipeline/configs.py to correctly query for your data.
  2. Add features in models/features.py.
  3. Modify models/preprocessing.py to transform input data for training.
  4. Modify models/keras/model.py and models/keras/constants.py to describe your ML model.
    • You can use an estimator based model, too. Change RUN_FN constant to models.estimator.model.run_fn in pipeline/configs.py.

Please see Trainer component guide for more introduction.

Cleaning up

To clean up all Google Cloud resources used in this project, you can delete the Google Cloud project you used for the tutorial.

Alternatively, you can clean up individual resources by visiting each consoles: