הנדסת ML טובה יותר עם מטא נתונים של ML

נניח תרחיש שבו אתה מגדיר צינור ML ייצור כדי לסווג פינגווינים. הצינור קולט את נתוני האימון שלך, מאמן ומעריך מודל ודוחף אותו לייצור.

עם זאת, כאשר אתה מנסה מאוחר יותר להשתמש במודל זה עם מערך נתונים גדול יותר המכיל סוגים שונים של פינגווינים, אתה מבחין שהמודל שלך אינו מתנהג כמצופה ומתחיל לסווג את המינים בצורה שגויה.

בשלב זה, אתה מעוניין לדעת:

מהי הדרך היעילה ביותר לנפות באגים במודל כאשר החפץ הזמין היחיד הוא הדגם בייצור?
באיזה מערך אימון נעשה שימוש לאימון המודל?
איזו ריצת אימון הובילה למודל השגוי הזה?
היכן תוצאות הערכת המודל?
היכן להתחיל באגים?

ML Metadata (MLMD) היא ספרייה אשר ממנף את metadata הקשורים דגמי ML כדי לעזור לך לענות על השאלות האלה ועוד. אנלוגיה מועילה היא לחשוב על מטא נתונים אלה כעל המקבילה לכניסה לפיתוח תוכנה. MLMD מאפשר לך לעקוב באופן אמין אחר החפצים והשושלת הקשורים לרכיבים השונים של צינור ה-ML שלך.

במדריך זה, אתה מגדיר TFX Pipeline כדי ליצור מודל שמסווג פינגווינים לשלושה מינים בהתבסס על מסת הגוף ואורך ועומק הקומות שלהם, ואורך הסנפירים שלהם. לאחר מכן אתה משתמש ב-MLMD כדי לעקוב אחר השושלת של רכיבי צינור.

TFX Pipelines ב-Colab

Colab היא סביבת פיתוח קלת משקל אשר שונה באופן משמעותי מסביבת ייצור. בייצור, ייתכן שיהיו לך רכיבי צינור שונים כמו קליטת נתונים, טרנספורמציה, אימון מודלים, היסטוריית ריצה וכו' על פני מספר רב של מערכות מבוזרות. עבור הדרכה זו, עליך להיות מודע לכך שקיימים הבדלים משמעותיים באחסון תזמורת ומטא נתונים - הכל מטופל באופן מקומי בתוך Colab. למידע נוסף על TFX ב Colab כאן .

להכין

ראשית, אנו מתקינים ומייבאים את החבילות הדרושות, מגדירים נתיבים ומורידים נתונים.

שדרוג פיפ

כדי להימנע משדרוג Pip במערכת בעת הפעלה מקומית, בדוק כדי לוודא שאנו פועלים ב-Colab. ניתן כמובן לשדרג מערכות מקומיות בנפרד.

try:
  import colab
  !pip install --upgrade pip
except:
  pass

התקן וייבא TFX

pip install -q -U tfx

ייבוא חבילות

הפעלת מחדש את זמן הריצה?

אם אתה משתמש ב-Google Colab, בפעם הראשונה שאתה מפעיל את התא שלמעלה, עליך להפעיל מחדש את זמן הריצה על ידי לחיצה מעל לחצן "התחל ריצה מחדש" או שימוש בתפריט "זמן ריצה > הפעל מחדש זמן ריצה...". זה בגלל האופן שבו קולאב טוען חבילות.

import os
import tempfile
import urllib
import pandas as pd

import tensorflow_model_analysis as tfma
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

בדוק את גרסאות ה-TFX וה-MLMD.

from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
import ml_metadata as mlmd
print('MLMD version: {}'.format(mlmd.__version__))

TFX version: 1.4.0
MLMD version: 1.4.0

הורד את מערך הנתונים

בשנת colab זו, אנו משתמשים במערך הפינגווינים פאלמר אשר ניתן למצוא באתר Github . עיבדנו את הנתונים על ידי השארת את כל הרשומות שלם, וטיפות island ואת sex עמודות, והמיר תוויות כדי int32 . מערך הנתונים מכיל 334 תיעודים של מסת הגוף ואורך ועומקם של גולמי הפינגווינים, ואורך הסנפירים שלהם. אתה משתמש בנתונים האלה כדי לסווג פינגווינים לאחד משלושה מינים.

DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
_data_root = tempfile.mkdtemp(prefix='tfx-data')
_data_filepath = os.path.join(_data_root, "penguins_processed.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

('/tmp/tfx-datal9104odr/penguins_processed.csv',
 <http.client.HTTPMessage at 0x7f9c6d8d2290>)

צור קשר אינטראקטיבי

כדי להפעיל רכיבים TFX אינטראקטיבי במחברת הזאת, ליצור InteractiveContext . InteractiveContext משתמשת לספריה זמנית עם מופע מסד MLMD חלוף. הערה כי שיחות InteractiveContext הם לא-ops מחוץ לסביבת Colab.

באופן כללי, זה תרגול טוב כדי ריצות צינור דומות לקבוצה תחת Context .

interactive_context = InteractiveContext()

WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8 as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/metadata.sqlite.

בנה את צינור TFX

צינור TFX מורכב ממספר רכיבים המבצעים היבטים שונים של זרימת העבודה של ML. במחברת זו, אתה יוצר ולהפעיל את ExampleGen , StatisticsGen , SchemaGen , ואת Trainer רכיבים ולהשתמש Evaluator ו Pusher מרכיב להעריך ולדחוף את המודל מאומן.

עיין הדרכת רכיבים לקבלת מידע נוסף על רכיבי צנרת TFX.

הפעל והפעל את רכיב ExampleGen

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
interactive_context.run(example_gen)

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

הפעל והפעל את רכיב ה-StatisticsGen

statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
interactive_context.run(statistics_gen)

WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.

הפעל והפעל את רכיב SchemaGen

infer_schema = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'], infer_feature_shape=True)
interactive_context.run(infer_schema)

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1205 11:16:00.941947  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type

הצג והפעל את רכיב המאמן

# Define the module file for the Trainer component
trainer_module_file = 'penguin_trainer.py'

%%writefile {trainer_module_file}

# Define the training algorithm for the Trainer module file
import os
from typing import List, Text

import tensorflow as tf
from tensorflow import keras

from tfx import v1 as tfx
from tfx_bsl.public import tfxio

from tensorflow_metadata.proto.v0 import schema_pb2

# Features used for classification - culmen length and depth, flipper length,
# body mass, and species.

_LABEL_KEY = 'species'

_FEATURE_KEYS = [
    'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g'
]


def _input_fn(file_pattern: List[Text],
              data_accessor: tfx.components.DataAccessor,
              schema: schema_pb2.Schema, batch_size: int) -> tf.data.Dataset:
  return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY), schema).repeat()


def _build_keras_model():
  inputs = [keras.layers.Input(shape=(1,), name=f) for f in _FEATURE_KEYS]
  d = keras.layers.concatenate(inputs)
  d = keras.layers.Dense(8, activation='relu')(d)
  d = keras.layers.Dense(8, activation='relu')(d)
  outputs = keras.layers.Dense(3)(d)
  model = keras.Model(inputs=inputs, outputs=outputs)
  model.compile(
      optimizer=keras.optimizers.Adam(1e-2),
      loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[keras.metrics.SparseCategoricalAccuracy()])
  return model


def run_fn(fn_args: tfx.components.FnArgs):
  schema = schema_pb2.Schema()
  tfx.utils.parse_pbtxt_file(fn_args.schema_path, schema)
  train_dataset = _input_fn(
      fn_args.train_files, fn_args.data_accessor, schema, batch_size=10)
  eval_dataset = _input_fn(
      fn_args.eval_files, fn_args.data_accessor, schema, batch_size=10)
  model = _build_keras_model()
  model.fit(
      train_dataset,
      epochs=int(fn_args.train_steps / 20),
      steps_per_epoch=20,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps)
  model.save(fn_args.serving_model_dir, save_format='tf')

Writing penguin_trainer.py

הפעל את Trainer הרכיב.

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(trainer_module_file),
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    train_args=tfx.proto.TrainArgs(num_steps=100),
    eval_args=tfx.proto.EvalArgs(num_steps=50))
interactive_context.run(trainer)

running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying penguin_trainer.py -> build/lib
installing to /tmp/tmpum1crtxy
running install
running install_lib
copying build/lib/penguin_trainer.py -> /tmp/tmpum1crtxy
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmpum1crtxy/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3.7.egg-info
running install_scripts
creating /tmp/tmpum1crtxy/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL
creating '/tmp/tmpo87nn6ey/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl' and adding '/tmp/tmpum1crtxy' to it
adding 'penguin_trainer.py'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/RECORD'
removing /tmp/tmpum1crtxy
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,
listing git files failed - pretending there aren't any
I1205 11:16:01.389324  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
I1205 11:16:01.392832  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
Processing /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/_wheels/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4
Epoch 1/5
20/20 [==============================] - 1s 11ms/step - loss: 0.9891 - sparse_categorical_accuracy: 0.4300 - val_loss: 0.9594 - val_sparse_categorical_accuracy: 0.4800
Epoch 2/5
20/20 [==============================] - 0s 6ms/step - loss: 0.8369 - sparse_categorical_accuracy: 0.6350 - val_loss: 0.7484 - val_sparse_categorical_accuracy: 0.8200
Epoch 3/5
20/20 [==============================] - 0s 6ms/step - loss: 0.5289 - sparse_categorical_accuracy: 0.8350 - val_loss: 0.5068 - val_sparse_categorical_accuracy: 0.7800
Epoch 4/5
20/20 [==============================] - 0s 6ms/step - loss: 0.4481 - sparse_categorical_accuracy: 0.7800 - val_loss: 0.4125 - val_sparse_categorical_accuracy: 0.8600
Epoch 5/5
20/20 [==============================] - 0s 6ms/step - loss: 0.3068 - sparse_categorical_accuracy: 0.8650 - val_loss: 0.3279 - val_sparse_categorical_accuracy: 0.8300
2021-12-05 11:16:06.493168: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/Trainer/model/4/Format-Serving/assets
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/Trainer/model/4/Format-Serving/assets

העריכו ודחפו את המודל

השתמש Evaluator מרכיב להעריך "יברך" המודל לפני השימוש Pusher רכיב לדחוף את המודל לספרייה ההגשה.

_serving_model_dir = os.path.join(tempfile.mkdtemp(),
                                  'serving_model/penguins_classification')

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='species', signature_name='serving_default')
    ],
    metrics_specs=[
        tfma.MetricsSpec(metrics=[
            tfma.MetricConfig(
                class_name='SparseCategoricalAccuracy',
                threshold=tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(
                        lower_bound={'value': 0.6})))
        ])
    ],
    slicing_specs=[tfma.SlicingSpec()])

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    schema=infer_schema.outputs['schema'],
    eval_config=eval_config)
interactive_context.run(evaluator)

I1205 11:16:07.075275  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
I1205 11:16:07.078761  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
interactive_context.run(pusher)

I1205 11:16:11.935312  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type

הפעלת צינור TFX מאכלסת את מסד הנתונים של MLMD. בסעיף הבא, אתה משתמש בממשק ה-API של MLMD כדי לבצע שאילתות במסד נתונים זה למידע על מטא נתונים.

שאל את מסד הנתונים של MLMD

מסד הנתונים של MLMD מאחסן שלושה סוגים של מטא נתונים:

מטא נתונים על המידע על הצינור והשושלת המשויכים לרכיבי הצינור
מטא נתונים על חפצים שנוצרו במהלך הפעלת הצינור
מטא נתונים על ביצוע הצינור

צינור סביבת ייצור טיפוסי משרת דגמים מרובים עם הגעת נתונים חדשים. כאשר אתה נתקל בתוצאות שגויות במודלים שהוגשו, אתה יכול לבצע שאילתות במסד הנתונים של MLMD כדי לבודד את המודלים השגויים. לאחר מכן תוכל לעקוב אחר השושלת של רכיבי הצינור התואמים למודלים אלה כדי לנפות באגים במודלים שלך

הגדרת החנות metadata (MD) עם InteractiveContext שהוגדרו קודם לכן לבצע שאילתות למסד הנתונים MLMD.

connection_config = interactive_context.metadata_connection_config
store = mlmd.MetadataStore(connection_config)

# All TFX artifacts are stored in the base directory
base_dir = connection_config.sqlite.filename_uri.split('metadata.sqlite')[0]

צור כמה פונקציות עוזר כדי להציג את הנתונים מחנות MD.

def display_types(types):
  # Helper function to render dataframes for the artifact and execution types
  table = {'id': [], 'name': []}
  for a_type in types:
    table['id'].append(a_type.id)
    table['name'].append(a_type.name)
  return pd.DataFrame(data=table)

def display_artifacts(store, artifacts):
  # Helper function to render dataframes for the input artifacts
  table = {'artifact id': [], 'type': [], 'uri': []}
  for a in artifacts:
    table['artifact id'].append(a.id)
    artifact_type = store.get_artifact_types_by_id([a.type_id])[0]
    table['type'].append(artifact_type.name)
    table['uri'].append(a.uri.replace(base_dir, './'))
  return pd.DataFrame(data=table)

def display_properties(store, node):
  # Helper function to render dataframes for artifact and execution properties
  table = {'property': [], 'value': []}
  for k, v in node.properties.items():
    table['property'].append(k)
    table['value'].append(
        v.string_value if v.HasField('string_value') else v.int_value)
  for k, v in node.custom_properties.items():
    table['property'].append(k)
    table['value'].append(
        v.string_value if v.HasField('string_value') else v.int_value)
  return pd.DataFrame(data=table)

ראשית, שאילתת החנות MD עבור רשימה של כל מאוחסן שלה ArtifactTypes .

display_types(store.get_artifact_types())

הבא, שאילתא כול PushedModel החפץ.

pushed_models = store.get_artifacts_by_type("PushedModel")
display_artifacts(store, pushed_models)

חפש בחנות MD את הדגם האחרון שנדחף. למדריך זה יש רק מודל דחף אחד.

pushed_model = pushed_models[-1]
display_properties(store, pushed_model)

אחד השלבים הראשונים באיתור באגים של מודל דחף הוא להסתכל על איזה מודל מאומן נדחף ולראות באילו נתוני אימון נעשה שימוש כדי לאמן את המודל הזה.

MLMD מספקת ממשקי API למעבר דרך גרף המקור, שבו אתה יכול להשתמש כדי לנתח את מקור המודל.

def get_one_hop_parent_artifacts(store, artifacts):
  # Get a list of artifacts within a 1-hop of the artifacts of interest
  artifact_ids = [artifact.id for artifact in artifacts]
  executions_ids = set(
      event.execution_id
      for event in store.get_events_by_artifact_ids(artifact_ids)
      if event.type == mlmd.proto.Event.OUTPUT)
  artifacts_ids = set(
      event.artifact_id
      for event in store.get_events_by_execution_ids(executions_ids)
      if event.type == mlmd.proto.Event.INPUT)
  return [artifact for artifact in store.get_artifacts_by_id(artifacts_ids)]

שאל את חפצי האב עבור המודל שנדחף.

parent_artifacts = get_one_hop_parent_artifacts(store, [pushed_model])
display_artifacts(store, parent_artifacts)

שאל את המאפיינים של המודל.

exported_model = parent_artifacts[0]
display_properties(store, exported_model)

שאל את החפצים במעלה הזרם עבור המודל.

model_parents = get_one_hop_parent_artifacts(store, [exported_model])
display_artifacts(store, model_parents)

קבל את נתוני האימון שאיתם המודל התאמן.

used_data = model_parents[0]
display_properties(store, used_data)

כעת, כשיש לך את נתוני האימון שאיתם המודל התאמן, בצע שאילתה במסד הנתונים שוב כדי למצוא את שלב ההדרכה (ביצוע). חפש בחנות MD רשימה של סוגי הביצוע הרשומים.

display_types(store.get_execution_types())

צעד האימונים הוא ExecutionType בשם tfx.components.trainer.component.Trainer . חצו את חנות MD כדי להפעיל את המאמן התואם לדגם הנדחף.

def find_producer_execution(store, artifact):
  executions_ids = set(
      event.execution_id
      for event in store.get_events_by_artifact_ids([artifact.id])
      if event.type == mlmd.proto.Event.OUTPUT)
  return store.get_executions_by_id(executions_ids)[0]

trainer = find_producer_execution(store, exported_model)
display_properties(store, trainer)

סיכום

במדריך זה, למדת כיצד תוכל למנף את MLMD כדי להתחקות אחר השושלת של רכיבי צינור ה-TFX שלך ולפתור בעיות.

למידע נוסף על אופן השימוש ב-MLMD, עיין במשאבים הנוספים הבאים: