إعادة تدريب مصنف الصور

عرض على TensorFlow.org تشغيل في Google Colab عرض على جيثب تحميل دفتر انظر نماذج TF Hub

مقدمة

نماذج تصنيف الصور لديها ملايين من المعلمات. يتطلب تدريبهم من البداية الكثير من بيانات التدريب المسمى والكثير من قوة الحوسبة. التعلم عن طريق النقل هو أسلوب يختصر الكثير من هذا عن طريق أخذ قطعة من نموذج تم تدريبه بالفعل على مهمة ذات صلة وإعادة استخدامه في نموذج جديد.

يوضح هذا Colab كيفية بناء نموذج Keras لتصنيف خمسة أنواع من الزهور باستخدام نموذج TF2 SavedModel مُدرب مسبقًا من TensorFlow Hub لاستخراج ميزة الصورة ، تم تدريبه على مجموعة بيانات ImageNet الأكبر والأكثر عمومية. اختياريًا ، يمكن تدريب مستخرج الميزات ("ضبطه بدقة") جنبًا إلى جنب مع المصنف المضاف حديثًا.

هل تبحث عن أداة بدلاً من ذلك؟

هذا برنامج تعليمي عن ترميز TensorFlow. إذا كنت ترغب في الأداة التي يبني مجرد TensorFlow أو TF لايت نموذجا لل، نلقي نظرة على make_image_classifier أداة سطر الأوامر التي يحصل المثبتة من خلال حزمة PIP tensorflow-hub[make_image_classifier] ، أو في هذا colab TF لايت.

يثبت

import itertools
import os

import matplotlib.pylab as plt
import numpy as np

import tensorflow as tf
import tensorflow_hub as hub

print("TF version:", tf.__version__)
print("Hub version:", hub.__version__)
print("GPU is", "available" if tf.test.is_gpu_available() else "NOT AVAILABLE")
2021-07-10 11:09:41.908423: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
TF version: 2.5.0
Hub version: 0.12.0
WARNING:tensorflow:From /tmp/ipykernel_26701/2333497024.py:12: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
GPU is available
2021-07-10 11:09:43.415190: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-10 11:09:43.416714: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-10 11:09:44.071543: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:44.072487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-10 11:09:44.072568: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-10 11:09:44.077666: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-10 11:09:44.077764: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-10 11:09:44.079203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-10 11:09:44.079628: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-10 11:09:44.081174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-07-10 11:09:44.082459: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-10 11:09:44.082713: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-10 11:09:44.082854: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:44.083790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:44.084516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-10 11:09:44.084569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-10 11:09:44.670922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-10 11:09:44.670956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-10 11:09:44.670964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-10 11:09:44.671157: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:44.671783: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:44.672424: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:44.673061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)

حدد وحدة TF2 SavedModel المراد استخدامها

بالنسبة للمبتدئين، واستخدام HTTPS: //tfhub.dev/ جوجل / imagenet / mobilenet_v2_100_224 / feature_vector / 4 . يمكن استخدام نفس عنوان URL في التعليمات البرمجية لتحديد SavedModel وفي المستعرض الخاص بك لإظهار الوثائق الخاصة به. (لاحظ أن النماذج بتنسيق TF1 Hub لن تعمل هنا.)

يمكنك العثور على مزيد من نماذج TF2 التي تولد صورة النواقل الميزة هنا .

هناك العديد من النماذج الممكنة لتجربتها. كل ما عليك فعله هو تحديد خلية مختلفة في الخلية أدناه ومتابعة دفتر الملاحظات.

Selected model: efficientnetv2-s : https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_s/feature_vector/1
Input size (384, 384)

قم بإعداد مجموعة بيانات الزهور

يتم تغيير حجم المدخلات بشكل مناسب للوحدة المحددة. تعمل زيادة مجموعة البيانات (أي التشوهات العشوائية للصورة في كل مرة تتم قراءتها) على تحسين التدريب ، وخاصةً. عند الصقل.

data_dir = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
228818944/228813984 [==============================] - 7s 0us/step

Found 3670 files belonging to 5 classes.
Using 2936 files for training.
2021-07-10 11:09:55.068048: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.068753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-10 11:09:55.068865: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.069469: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.070075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-10 11:09:55.070582: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.071188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-10 11:09:55.071258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.071913: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.072499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-10 11:09:55.072531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-10 11:09:55.072538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-10 11:09:55.072544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-10 11:09:55.072632: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.073239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:09:55.073809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
Found 3670 files belonging to 5 classes.
Using 734 files for validation.

تحديد النموذج

كل ما يتطلبه الأمر هو لوضع المصنف الخطي على رأس feature_extractor_layer مع وحدة محور.

للسرعة، ونحن نبدأ بها مع غير قابلة للتدريب feature_extractor_layer ، ولكن يمكنك أيضا تمكين صقل لمزيد من الدقة.

do_fine_tuning = False
print("Building model with", model_handle)
model = tf.keras.Sequential([
    # Explicitly define the input shape so the model can be properly
    # loaded by the TFLiteConverter
    tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE + (3,)),
    hub.KerasLayer(model_handle, trainable=do_fine_tuning),
    tf.keras.layers.Dropout(rate=0.2),
    tf.keras.layers.Dense(len(class_names),
                          kernel_regularizer=tf.keras.regularizers.l2(0.0001))
])
model.build((None,)+IMAGE_SIZE+(3,))
model.summary()
Building model with https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_s/feature_vector/1
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
keras_layer (KerasLayer)     (None, 1280)              20331360  
_________________________________________________________________
dropout (Dropout)            (None, 1280)              0         
_________________________________________________________________
dense (Dense)                (None, 5)                 6405      
=================================================================
Total params: 20,337,765
Trainable params: 6,405
Non-trainable params: 20,331,360
_________________________________________________________________

تدريب النموذج

model.compile(
  optimizer=tf.keras.optimizers.SGD(lr=0.005, momentum=0.9), 
  loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True, label_smoothing=0.1),
  metrics=['accuracy'])
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  "The `lr` argument is deprecated, use `learning_rate` instead.")
steps_per_epoch = train_size // BATCH_SIZE
validation_steps = valid_size // BATCH_SIZE
hist = model.fit(
    train_ds,
    epochs=5, steps_per_epoch=steps_per_epoch,
    validation_data=val_ds,
    validation_steps=validation_steps).history
Epoch 1/5
2021-07-10 11:10:12.421017: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-07-10 11:10:12.421548: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000194999 Hz
2021-07-10 11:10:21.888082: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-10 11:10:23.880143: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-07-10 11:10:29.026270: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-10 11:10:29.386777: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
183/183 [==============================] - 33s 87ms/step - loss: 0.8008 - accuracy: 0.8152 - val_loss: 0.6219 - val_accuracy: 0.9111
Epoch 2/5
183/183 [==============================] - 15s 79ms/step - loss: 0.6302 - accuracy: 0.9072 - val_loss: 0.5925 - val_accuracy: 0.9319
Epoch 3/5
183/183 [==============================] - 14s 79ms/step - loss: 0.5983 - accuracy: 0.9260 - val_loss: 0.5788 - val_accuracy: 0.9361
Epoch 4/5
183/183 [==============================] - 14s 79ms/step - loss: 0.5845 - accuracy: 0.9308 - val_loss: 0.5682 - val_accuracy: 0.9431
Epoch 5/5
183/183 [==============================] - 14s 78ms/step - loss: 0.5725 - accuracy: 0.9408 - val_loss: 0.5651 - val_accuracy: 0.9431
plt.figure()
plt.ylabel("Loss (training and validation)")
plt.xlabel("Training Steps")
plt.ylim([0,2])
plt.plot(hist["loss"])
plt.plot(hist["val_loss"])

plt.figure()
plt.ylabel("Accuracy (training and validation)")
plt.xlabel("Training Steps")
plt.ylim([0,1])
plt.plot(hist["accuracy"])
plt.plot(hist["val_accuracy"])
[<matplotlib.lines.Line2D at 0x7f37c0290e10>]

بي إن جي

بي إن جي

جرب النموذج الموجود على صورة من بيانات التحقق من الصحة:

x, y = next(iter(val_ds))
image = x[0, :, :, :]
true_index = np.argmax(y[0])
plt.imshow(image)
plt.axis('off')
plt.show()

# Expand the validation image to (1, 224, 224, 3) before predicting the label
prediction_scores = model.predict(np.expand_dims(image, axis=0))
predicted_index = np.argmax(prediction_scores)
print("True label: " + class_names[true_index])
print("Predicted label: " + class_names[predicted_index])

بي إن جي

True label: sunflowers
Predicted label: sunflowers

أخيرًا ، يمكن حفظ النموذج المدرب للنشر في خدمة TF أو TF Lite (على الهاتف المحمول) على النحو التالي.

saved_model_path = f"/tmp/saved_flowers_model_{model_name}"
tf.saved_model.save(model, saved_model_path)
2021-07-10 11:11:49.663732: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 855). These functions will not be directly callable after loading.
WARNING:tensorflow:FOR KERAS USERS: The object that you are saving contains one or more Keras models or layers. If you are loading the SavedModel with `tf.keras.models.load_model`, continue reading (otherwise, you may ignore the following instructions). Please change your code to save with `tf.keras.models.save_model` or `model.save`, and confirm that the file "keras.metadata" exists in the export directory. In the future, Keras will only load the SavedModels that have this file. In other words, `tf.saved_model.save` will no longer write SavedModels that can be recovered as Keras models (this will apply in TF 2.5).

FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.
WARNING:tensorflow:FOR KERAS USERS: The object that you are saving contains one or more Keras models or layers. If you are loading the SavedModel with `tf.keras.models.load_model`, continue reading (otherwise, you may ignore the following instructions). Please change your code to save with `tf.keras.models.save_model` or `model.save`, and confirm that the file "keras.metadata" exists in the export directory. In the future, Keras will only load the SavedModels that have this file. In other words, `tf.saved_model.save` will no longer write SavedModels that can be recovered as Keras models (this will apply in TF 2.5).

FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.
INFO:tensorflow:Assets written to: /tmp/saved_flowers_model_efficientnetv2-s/assets
INFO:tensorflow:Assets written to: /tmp/saved_flowers_model_efficientnetv2-s/assets

اختياري: النشر إلى TensorFlow Lite

TensorFlow لايت يتيح لك نشر نماذج TensorFlow إلى الأجهزة النقالة وتقنيات عمليات. رمز أدناه يبين كيفية تحويل نموذج تدريبهم على TF ايت وتطبيق أدوات ما بعد التدريب من TensorFlow نموذج الأمثل أدوات . أخيرًا ، يتم تشغيله في مترجم TF Lite لفحص الجودة الناتجة

  • يوفر التحويل بدون تحسين نفس النتائج السابقة (حتى خطأ التقريب).
  • التحويل باستخدام التحسين بدون أي بيانات يكمم أوزان النموذج إلى 8 بتات ، لكن الاستدلال لا يزال يستخدم حساب النقطة العائمة لتنشيطات الشبكة العصبية. يؤدي ذلك إلى تقليل حجم النموذج بمعدل 4 مرات تقريبًا وتحسين زمن انتقال وحدة المعالجة المركزية على الأجهزة المحمولة.
  • علاوة على ذلك ، يمكن قياس حساب عمليات تنشيط الشبكة العصبية إلى أعداد صحيحة 8 بت أيضًا إذا تم توفير مجموعة بيانات مرجعية صغيرة لمعايرة نطاق التكميم. على جهاز محمول ، يعمل هذا على تسريع الاستدلال بشكل أكبر ويجعل من الممكن العمل على مسرعات مثل EdgeTPU.

إعدادات التحسين

Wrote TFLite model of 80553236 bytes.
2021-07-10 11:12:18.591401: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:345] Ignored output_format.
2021-07-10 11:12:18.591450: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:348] Ignored drop_control_dependency.
2021-07-10 11:12:18.591456: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:354] Ignored change_concat_input_ranges.
2021-07-10 11:12:18.592361: I tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /tmp/saved_flowers_model_efficientnetv2-s
2021-07-10 11:12:18.733110: I tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2021-07-10 11:12:18.733153: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /tmp/saved_flowers_model_efficientnetv2-s
2021-07-10 11:12:18.733245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-10 11:12:18.733262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
2021-07-10 11:12:19.277214: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-07-10 11:12:20.495780: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /tmp/saved_flowers_model_efficientnetv2-s
2021-07-10 11:12:20.927889: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 2335531 microseconds.
2021-07-10 11:12:22.377509: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2021-07-10 11:12:24.006170: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:12:24.006616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: NVIDIA Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-10 11:12:24.006755: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:12:24.007069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:12:24.007322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-10 11:12:24.007365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-10 11:12:24.007372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-07-10 11:12:24.007391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-07-10 11:12:24.007491: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:12:24.007783: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-10 11:12:24.008071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
interpreter = tf.lite.Interpreter(model_content=lite_model_content)
# This little helper wraps the TF Lite interpreter as a numpy-to-numpy function.
def lite_model(images):
  interpreter.allocate_tensors()
  interpreter.set_tensor(interpreter.get_input_details()[0]['index'], images)
  interpreter.invoke()
  return interpreter.get_tensor(interpreter.get_output_details()[0]['index'])
num_eval_examples = 50 
eval_dataset = ((image, label)  # TFLite expects batch size 1.
                for batch in train_ds
                for (image, label) in zip(*batch))
count = 0
count_lite_tf_agree = 0
count_lite_correct = 0
for image, label in eval_dataset:
  probs_lite = lite_model(image[None, ...])[0]
  probs_tf = model(image[None, ...]).numpy()[0]
  y_lite = np.argmax(probs_lite)
  y_tf = np.argmax(probs_tf)
  y_true = np.argmax(label)
  count +=1
  if y_lite == y_tf: count_lite_tf_agree += 1
  if y_lite == y_true: count_lite_correct += 1
  if count >= num_eval_examples: break
print("TF Lite model agrees with original model on %d of %d examples (%g%%)." %
      (count_lite_tf_agree, count, 100.0 * count_lite_tf_agree / count))
print("TF Lite model is accurate on %d of %d examples (%g%%)." %
      (count_lite_correct, count, 100.0 * count_lite_correct / count))
TF Lite model agrees with original model on 50 of 50 examples (100%).
TF Lite model is accurate on 47 of 50 examples (94%).