এমএল মেটাডেটা সহ আরও ভাল এমএল ইঞ্জিনিয়ারিং

একটি দৃশ্যকল্প অনুমান করুন যেখানে আপনি পেঙ্গুইনদের শ্রেণীবদ্ধ করার জন্য একটি উত্পাদন এমএল পাইপলাইন সেট আপ করেছেন। পাইপলাইনটি আপনার প্রশিক্ষণের ডেটা, ট্রেনিং এবং একটি মডেলকে মূল্যায়ন করে এবং এটিকে উৎপাদনে ঠেলে দেয়।

যাইহোক, যখন আপনি পরে এই মডেলটিকে একটি বৃহত্তর ডেটাসেটের সাথে ব্যবহার করার চেষ্টা করেন যাতে বিভিন্ন ধরণের পেঙ্গুইন রয়েছে, আপনি লক্ষ্য করেন যে আপনার মডেলটি প্রত্যাশিতভাবে আচরণ করে না এবং প্রজাতিগুলিকে ভুলভাবে শ্রেণীবদ্ধ করা শুরু করে৷

এই মুহুর্তে, আপনি জানতে আগ্রহী:

মডেলটিকে ডিবাগ করার সবচেয়ে কার্যকরী উপায় কী যখন শুধুমাত্র উপলব্ধ আর্টিফ্যাক্টটি উৎপাদনে মডেল হয়?
মডেল প্রশিক্ষণের জন্য কোন প্রশিক্ষণ ডেটাসেট ব্যবহার করা হয়েছিল?
কোন প্রশিক্ষণ এই ভ্রান্ত মডেলের নেতৃত্বে?
মডেল মূল্যায়ন ফলাফল কোথায়?
কোথায় ডিবাগিং শুরু করবেন?

এমএল মেটাডেটা (MLMD) একটি লাইব্রেরি আপনি এবং আরো এই প্রশ্নগুলোর উত্তর সাহায্য করার জন্য এমএল মডেলের সঙ্গে যুক্ত মেটাডেটা লিভারেজ হয়। একটি সহায়ক সাদৃশ্য হল এই মেটাডেটাকে সফটওয়্যার ডেভেলপমেন্টে লগ ইন করার সমতুল্য মনে করা। MLMD আপনাকে আপনার ML পাইপলাইনের বিভিন্ন উপাদানের সাথে যুক্ত নিদর্শন এবং বংশকে নির্ভরযোগ্যভাবে ট্র্যাক করতে সক্ষম করে।

এই টিউটোরিয়ালে, আপনি একটি মডেল তৈরি করতে একটি TFX পাইপলাইন সেট আপ করেছেন যা পেঙ্গুইনদের শরীরের ভর এবং তাদের কুলমেনের দৈর্ঘ্য এবং গভীরতা এবং তাদের ফ্লিপারের দৈর্ঘ্যের উপর ভিত্তি করে তিনটি প্রজাতিতে শ্রেণীবদ্ধ করে। তারপরে আপনি পাইপলাইনের উপাদানগুলির বংশ ট্র্যাক করতে MLMD ব্যবহার করেন।

Colab-এ TFX পাইপলাইন

Colab হল একটি লাইটওয়েট ডেভেলপমেন্ট এনভায়রনমেন্ট যা উৎপাদন পরিবেশ থেকে উল্লেখযোগ্যভাবে আলাদা। উৎপাদনে, আপনার কাছে বিভিন্ন পাইপলাইন উপাদান থাকতে পারে যেমন ডেটা ইনজেশন, ট্রান্সফরমেশন, মডেল ট্রেনিং, রান হিস্ট্রি ইত্যাদি একাধিক, বিতরণ করা সিস্টেমে। এই টিউটোরিয়ালের জন্য, আপনার সচেতন হওয়া উচিত যে অর্কেস্ট্রেশন এবং মেটাডেটা স্টোরেজের মধ্যে উল্লেখযোগ্য পার্থক্য রয়েছে - এটি সবই Colab-এর মধ্যে স্থানীয়ভাবে পরিচালনা করা হয়। Colab মধ্যে TFX সম্পর্কে আরো জানুন এখানে ।

সেটআপ

প্রথমত, আমরা প্রয়োজনীয় প্যাকেজগুলি ইনস্টল এবং আমদানি করি, পাথ সেট আপ করি এবং ডেটা ডাউনলোড করি।

পিপ আপগ্রেড করুন

স্থানীয়ভাবে চালানোর সময় একটি সিস্টেমে পিপ আপগ্রেড করা এড়াতে, আমরা Colab-এ চলছি কিনা তা নিশ্চিত করুন। স্থানীয় সিস্টেম অবশ্যই আলাদাভাবে আপগ্রেড করা যেতে পারে।

try:
  import colab
  !pip install --upgrade pip
except:
  pass

TFX ইনস্টল এবং আমদানি করুন

pip install -q -U tfx

প্যাকেজ আমদানি করুন

আপনি কি রানটাইম রিস্টার্ট করেছেন?

আপনি যদি Google Colab ব্যবহার করেন, প্রথমবার যখন আপনি উপরের সেলটি চালান, তাহলে আপনাকে অবশ্যই উপরে "রিস্টার্ট RUNTIME" বোতামে ক্লিক করে বা "রানটাইম > রানটাইম রিস্টার্ট..." মেনু ব্যবহার করে রানটাইম রিস্টার্ট করতে হবে। Colab যেভাবে প্যাকেজগুলি লোড করে তার কারণেই এটি হয়েছে৷

import os
import tempfile
import urllib
import pandas as pd

import tensorflow_model_analysis as tfma
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

TFX, এবং MLMD সংস্করণগুলি পরীক্ষা করুন৷

from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
import ml_metadata as mlmd
print('MLMD version: {}'.format(mlmd.__version__))

TFX version: 1.4.0
MLMD version: 1.4.0

ডেটাসেট ডাউনলোড করুন

এই colab, আমরা ব্যবহার পামার পেঙ্গুইনদের ডেটা সেটটি পাওয়া যাবে যা গিটহাব । আমরা কোনো অসম্পূর্ণ রেকর্ড জানতে যাব দ্বারা ডেটা সেটটি প্রক্রিয়া, এবং ড্রপ island এবং sex কলাম, এবং লেবেল রূপান্তরিত int32 । ডেটাসেটে শরীরের ভর এবং পেঙ্গুইনের কুলমেনের দৈর্ঘ্য ও গভীরতা এবং তাদের ফ্লিপারের দৈর্ঘ্যের 334টি রেকর্ড রয়েছে। আপনি তিনটি প্রজাতির একটিতে পেঙ্গুইনকে শ্রেণীবদ্ধ করতে এই ডেটা ব্যবহার করেন।

DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
_data_root = tempfile.mkdtemp(prefix='tfx-data')
_data_filepath = os.path.join(_data_root, "penguins_processed.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

('/tmp/tfx-datal9104odr/penguins_processed.csv',
 <http.client.HTTPMessage at 0x7f9c6d8d2290>)

একটি ইন্টারেক্টিভ কনটেক্সট তৈরি করুন

TFX উপাদান চালানোর জন্য ইন্টারেক্টিভ এই নোটবুক, একটি তৈরি InteractiveContext । InteractiveContext একটি ক্ষণজীবী MLMD ডাটাবেসের নিদর্শনের সঙ্গে অস্থায়ী ডাইরেক্টরি ব্যবহার করে। লক্ষ্য করুন কল InteractiveContext Colab পরিবেশ বাহিরে নো অপস হয়।

সাধারণভাবে, এটি একটি অধীনে গ্রুপ অনুরূপ পাইপলাইন রান করার জন্য একটি ভাল অভ্যাস Context ।

interactive_context = InteractiveContext()

WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8 as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/metadata.sqlite.

TFX পাইপলাইন নির্মাণ করুন

একটি TFX পাইপলাইনে বিভিন্ন উপাদান থাকে যা ML কর্মপ্রবাহের বিভিন্ন দিক সম্পাদন করে। এই নোটবুক, আপনি তৈরি এবং চালানোর ExampleGen , StatisticsGen , SchemaGen , এবং Trainer উপাদান এবং ব্যবহার Evaluator এবং Pusher মূল্যায়ন এবং প্রশিক্ষিত মডেল ধাক্কা অংশটি।

পড়ুন উপাদান টিউটোরিয়াল TFX পাইপলাইন উপাদান সম্পর্কে আরও তথ্যের জন্য।

Instantiate এবং ExampleGen কম্পোনেন্ট চালান

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
interactive_context.run(example_gen)

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

StatisticsGen কম্পোনেন্ট ইনস্ট্যান্ট এবং রান করুন

statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
interactive_context.run(statistics_gen)

WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.

স্কিমাজেন কম্পোনেন্ট ইনস্ট্যান্টিয়েট এবং চালান

infer_schema = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'], infer_feature_shape=True)
interactive_context.run(infer_schema)

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1205 11:16:00.941947  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type

প্রশিক্ষক উপাদানটি ইনস্ট্যান্টিয়েট এবং চালান

# Define the module file for the Trainer component
trainer_module_file = 'penguin_trainer.py'

%%writefile {trainer_module_file}

# Define the training algorithm for the Trainer module file
import os
from typing import List, Text

import tensorflow as tf
from tensorflow import keras

from tfx import v1 as tfx
from tfx_bsl.public import tfxio

from tensorflow_metadata.proto.v0 import schema_pb2

# Features used for classification - culmen length and depth, flipper length,
# body mass, and species.

_LABEL_KEY = 'species'

_FEATURE_KEYS = [
    'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g'
]


def _input_fn(file_pattern: List[Text],
              data_accessor: tfx.components.DataAccessor,
              schema: schema_pb2.Schema, batch_size: int) -> tf.data.Dataset:
  return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY), schema).repeat()


def _build_keras_model():
  inputs = [keras.layers.Input(shape=(1,), name=f) for f in _FEATURE_KEYS]
  d = keras.layers.concatenate(inputs)
  d = keras.layers.Dense(8, activation='relu')(d)
  d = keras.layers.Dense(8, activation='relu')(d)
  outputs = keras.layers.Dense(3)(d)
  model = keras.Model(inputs=inputs, outputs=outputs)
  model.compile(
      optimizer=keras.optimizers.Adam(1e-2),
      loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[keras.metrics.SparseCategoricalAccuracy()])
  return model


def run_fn(fn_args: tfx.components.FnArgs):
  schema = schema_pb2.Schema()
  tfx.utils.parse_pbtxt_file(fn_args.schema_path, schema)
  train_dataset = _input_fn(
      fn_args.train_files, fn_args.data_accessor, schema, batch_size=10)
  eval_dataset = _input_fn(
      fn_args.eval_files, fn_args.data_accessor, schema, batch_size=10)
  model = _build_keras_model()
  model.fit(
      train_dataset,
      epochs=int(fn_args.train_steps / 20),
      steps_per_epoch=20,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps)
  model.save(fn_args.serving_model_dir, save_format='tf')

Writing penguin_trainer.py

চালান Trainer অংশ।

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(trainer_module_file),
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    train_args=tfx.proto.TrainArgs(num_steps=100),
    eval_args=tfx.proto.EvalArgs(num_steps=50))
interactive_context.run(trainer)

running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying penguin_trainer.py -> build/lib
installing to /tmp/tmpum1crtxy
running install
running install_lib
copying build/lib/penguin_trainer.py -> /tmp/tmpum1crtxy
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmpum1crtxy/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3.7.egg-info
running install_scripts
creating /tmp/tmpum1crtxy/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL
creating '/tmp/tmpo87nn6ey/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl' and adding '/tmp/tmpum1crtxy' to it
adding 'penguin_trainer.py'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/RECORD'
removing /tmp/tmpum1crtxy
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,
listing git files failed - pretending there aren't any
I1205 11:16:01.389324  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
I1205 11:16:01.392832  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
Processing /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/_wheels/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4
Epoch 1/5
20/20 [==============================] - 1s 11ms/step - loss: 0.9891 - sparse_categorical_accuracy: 0.4300 - val_loss: 0.9594 - val_sparse_categorical_accuracy: 0.4800
Epoch 2/5
20/20 [==============================] - 0s 6ms/step - loss: 0.8369 - sparse_categorical_accuracy: 0.6350 - val_loss: 0.7484 - val_sparse_categorical_accuracy: 0.8200
Epoch 3/5
20/20 [==============================] - 0s 6ms/step - loss: 0.5289 - sparse_categorical_accuracy: 0.8350 - val_loss: 0.5068 - val_sparse_categorical_accuracy: 0.7800
Epoch 4/5
20/20 [==============================] - 0s 6ms/step - loss: 0.4481 - sparse_categorical_accuracy: 0.7800 - val_loss: 0.4125 - val_sparse_categorical_accuracy: 0.8600
Epoch 5/5
20/20 [==============================] - 0s 6ms/step - loss: 0.3068 - sparse_categorical_accuracy: 0.8650 - val_loss: 0.3279 - val_sparse_categorical_accuracy: 0.8300
2021-12-05 11:16:06.493168: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/Trainer/model/4/Format-Serving/assets
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/Trainer/model/4/Format-Serving/assets

মূল্যায়ন এবং মডেল ধাক্কা

ব্যবহার করুন Evaluator নির্ণয় করা এবং ব্যবহার করার আগে মডেল 'আশীর্বাদ' উপাদান Pusher উপাদান একটি ভজনা নির্দেশিকাতে মডেল ধাক্কা।

_serving_model_dir = os.path.join(tempfile.mkdtemp(),
                                  'serving_model/penguins_classification')

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='species', signature_name='serving_default')
    ],
    metrics_specs=[
        tfma.MetricsSpec(metrics=[
            tfma.MetricConfig(
                class_name='SparseCategoricalAccuracy',
                threshold=tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(
                        lower_bound={'value': 0.6})))
        ])
    ],
    slicing_specs=[tfma.SlicingSpec()])

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    schema=infer_schema.outputs['schema'],
    eval_config=eval_config)
interactive_context.run(evaluator)

I1205 11:16:07.075275  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
I1205 11:16:07.078761  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
interactive_context.run(pusher)

I1205 11:16:11.935312  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type

TFX পাইপলাইন চালানোর ফলে MLMD ডেটাবেস তৈরি হয়। পরবর্তী বিভাগে, আপনি মেটাডেটা তথ্যের জন্য এই ডাটাবেস অনুসন্ধান করতে MLMD API ব্যবহার করেন।

MLMD ডাটাবেস জিজ্ঞাসা করুন

MLMD ডাটাবেস তিন ধরনের মেটাডেটা সঞ্চয় করে:

পাইপলাইন এবং পাইপলাইনের উপাদানগুলির সাথে সম্পর্কিত বংশের তথ্য সম্পর্কে মেটাডেটা
পাইপলাইন চালানোর সময় তৈরি করা আর্টিফ্যাক্ট সম্পর্কে মেটাডেটা
পাইপলাইনের নির্বাহ সম্পর্কে মেটাডেটা

একটি সাধারণ উত্পাদন পরিবেশ পাইপলাইন নতুন ডেটা আসার সাথে সাথে একাধিক মডেল পরিবেশন করে। আপনি যখন পরিবেশিত মডেলগুলিতে ভুল ফলাফলের সম্মুখীন হন, আপনি ভুল মডেলগুলিকে আলাদা করতে MLMD ডাটাবেসকে জিজ্ঞাসা করতে পারেন৷ তারপরে আপনি আপনার মডেলগুলি ডিবাগ করতে এই মডেলগুলির সাথে সম্পর্কিত পাইপলাইনের উপাদানগুলির বংশের সন্ধান করতে পারেন

মেটাডেটা (এমডি) দোকান সেট আপ করুন সঙ্গে InteractiveContext পূর্বে সংজ্ঞায়িত MLMD তথ্যভান্ডার অনুসন্ধান করতে।

connection_config = interactive_context.metadata_connection_config
store = mlmd.MetadataStore(connection_config)

# All TFX artifacts are stored in the base directory
base_dir = connection_config.sqlite.filename_uri.split('metadata.sqlite')[0]

এমডি স্টোর থেকে ডেটা দেখতে কিছু সহায়ক ফাংশন তৈরি করুন।

def display_types(types):
  # Helper function to render dataframes for the artifact and execution types
  table = {'id': [], 'name': []}
  for a_type in types:
    table['id'].append(a_type.id)
    table['name'].append(a_type.name)
  return pd.DataFrame(data=table)

def display_artifacts(store, artifacts):
  # Helper function to render dataframes for the input artifacts
  table = {'artifact id': [], 'type': [], 'uri': []}
  for a in artifacts:
    table['artifact id'].append(a.id)
    artifact_type = store.get_artifact_types_by_id([a.type_id])[0]
    table['type'].append(artifact_type.name)
    table['uri'].append(a.uri.replace(base_dir, './'))
  return pd.DataFrame(data=table)

def display_properties(store, node):
  # Helper function to render dataframes for artifact and execution properties
  table = {'property': [], 'value': []}
  for k, v in node.properties.items():
    table['property'].append(k)
    table['value'].append(
        v.string_value if v.HasField('string_value') else v.int_value)
  for k, v in node.custom_properties.items():
    table['property'].append(k)
    table['value'].append(
        v.string_value if v.HasField('string_value') else v.int_value)
  return pd.DataFrame(data=table)

প্রথমত, তার সকল সঞ্চিত একটি তালিকার জন্য কোয়েরি এমডি দোকান ArtifactTypes ।

display_types(store.get_artifact_types())

এর পরে, ক্যোয়ারী সব PushedModel নিদর্শন।

pushed_models = store.get_artifacts_by_type("PushedModel")
display_artifacts(store, pushed_models)

সর্বশেষ পুশ করা মডেলের জন্য MD স্টোরে প্রশ্ন করুন। এই টিউটোরিয়ালটিতে শুধুমাত্র একটি পুশ করা মডেল রয়েছে।

pushed_model = pushed_models[-1]
display_properties(store, pushed_model)

একটি পুশ করা মডেল ডিবাগ করার প্রথম ধাপগুলির মধ্যে একটি হল কোন প্রশিক্ষিত মডেলটি পুশ করা হয়েছে এবং সেই মডেলকে প্রশিক্ষণের জন্য কোন প্রশিক্ষণ ডেটা ব্যবহার করা হয়েছে তা দেখা৷

MLMD প্রোভেন্যান্স গ্রাফের মধ্য দিয়ে চলার জন্য ট্রাভার্সাল API প্রদান করে, যেটি আপনি মডেল প্রোভেন্যান্স বিশ্লেষণ করতে ব্যবহার করতে পারেন।

def get_one_hop_parent_artifacts(store, artifacts):
  # Get a list of artifacts within a 1-hop of the artifacts of interest
  artifact_ids = [artifact.id for artifact in artifacts]
  executions_ids = set(
      event.execution_id
      for event in store.get_events_by_artifact_ids(artifact_ids)
      if event.type == mlmd.proto.Event.OUTPUT)
  artifacts_ids = set(
      event.artifact_id
      for event in store.get_events_by_execution_ids(executions_ids)
      if event.type == mlmd.proto.Event.INPUT)
  return [artifact for artifact in store.get_artifacts_by_id(artifacts_ids)]

পুশ করা মডেলের জন্য অভিভাবক নিদর্শনগুলি জিজ্ঞাসা করুন৷

parent_artifacts = get_one_hop_parent_artifacts(store, [pushed_model])
display_artifacts(store, parent_artifacts)

মডেলের জন্য বৈশিষ্ট্য অনুসন্ধান করুন.

exported_model = parent_artifacts[0]
display_properties(store, exported_model)

মডেলের জন্য আপস্ট্রিম আর্টিফ্যাক্টগুলি জিজ্ঞাসা করুন৷

model_parents = get_one_hop_parent_artifacts(store, [exported_model])
display_artifacts(store, model_parents)

মডেলের সাথে প্রশিক্ষিত প্রশিক্ষণের ডেটা পান।

used_data = model_parents[0]
display_properties(store, used_data)

এখন আপনার কাছে প্রশিক্ষণের ডেটা রয়েছে যা মডেলটি প্রশিক্ষণ দিয়েছিল, প্রশিক্ষণের ধাপ (নির্বাহ) খুঁজতে আবার ডাটাবেস অনুসন্ধান করুন। নিবন্ধিত এক্সিকিউশন প্রকারের একটি তালিকার জন্য MD স্টোরে জিজ্ঞাসা করুন৷

display_types(store.get_execution_types())

প্রশিক্ষণ পদক্ষেপ ExecutionType নামে tfx.components.trainer.component.Trainer । প্রশিক্ষক চালানোর জন্য MD স্টোরটি অতিক্রম করুন যা পুশ করা মডেলের সাথে মিলে যায়।

def find_producer_execution(store, artifact):
  executions_ids = set(
      event.execution_id
      for event in store.get_events_by_artifact_ids([artifact.id])
      if event.type == mlmd.proto.Event.OUTPUT)
  return store.get_executions_by_id(executions_ids)[0]

trainer = find_producer_execution(store, exported_model)
display_properties(store, trainer)

সারসংক্ষেপ

এই টিউটোরিয়ালে, আপনি শিখেছেন কীভাবে আপনি আপনার TFX পাইপলাইনের উপাদানগুলির বংশের সন্ধান করতে এবং সমস্যাগুলি সমাধান করতে MLMD ব্যবহার করতে পারেন।

এমএলএমডি কীভাবে ব্যবহার করবেন সে সম্পর্কে আরও জানতে, এই অতিরিক্ত সংস্থানগুলি দেখুন: