Object Detection with TensorFlow Lite Model Maker

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook

In this colab notebook, you'll learn how to use the TensorFlow Lite Model Maker library to train a custom object detection model capable of detecting salads within images on a mobile device.

The Model Maker library uses transfer learning to simplify the process of training a TensorFlow Lite model using a custom dataset. Retraining a TensorFlow Lite model with your own custom dataset reduces the amount of training data required and will shorten the training time.

You'll use the publicly available Salads dataset, which was created from the Open Images Dataset V4.

Each image in the dataset contains objects labeled as one of the following classes:

  • Baked Good
  • Cheese
  • Salad
  • Seafood
  • Tomato

The dataset contains the bounding-boxes specifying where each object locates, together with the object's label.

Here is an example image from the dataset:


Prerequisites

Install the required packages

Start by installing the required packages, including the Model Maker package from the GitHub repo and the pycocotools library you'll use for evaluation.

pip install -q tensorflow==2.5.0
pip install -q --use-deprecated=legacy-resolver tflite-model-maker
pip install -q pycocotools

Import the required packages.

import numpy as np
import os

from tflite_model_maker.config import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import object_detector

import tensorflow as tf
assert tf.__version__.startswith('2')

tf.get_logger().setLevel('ERROR')
from absl import logging
logging.set_verbosity(logging.ERROR)
2021-08-23 11:11:53.911046: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/numba/core/errors.py:168: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
<span class="ansired">---------------------------------------------------------------------------</span>

<span class="ansired">RuntimeError</span>                              Traceback (most recent call last)

<span class="ansired">RuntimeError</span>: module compiled against API version 0xe but this version of numpy is 0xd

Prepare the dataset

Here you'll use the same dataset as the AutoML quickstart.

The Salads dataset is available at: gs://cloud-ml-data/img/openimage/csv/salads_ml_use.csv.

It contains 175 images for training, 25 images for validation, and 25 images for testing. The dataset has five classes: Salad, Seafood, Tomato, Baked goods, Cheese.


The dataset is provided in CSV format:

TRAINING,gs://cloud-ml-data/img/openimage/3/2520/3916261642_0a504acd60_o.jpg,Salad,0.0,0.0954,,,0.977,0.957,,
VALIDATION,gs://cloud-ml-data/img/openimage/3/2520/3916261642_0a504acd60_o.jpg,Seafood,0.0154,0.1538,,,1.0,0.802,,
TEST,gs://cloud-ml-data/img/openimage/3/2520/3916261642_0a504acd60_o.jpg,Tomato,0.0,0.655,,,0.231,0.839,,
  • Each row corresponds to an object localized inside a larger image, with each object specifically designated as test, train, or validation data. You'll learn more about what that means in a later stage in this notebook.
  • The three lines included here indicate three distinct objects located inside the same image available at gs://cloud-ml-data/img/openimage/3/2520/3916261642_0a504acd60_o.jpg.
  • Each row has a different label: Salad, Seafood, Tomato, etc.
  • Bounding boxes are specified for each image using the top left and bottom right vertices.

Here is a visualzation of these three lines:


If you want to know more about how to prepare your own CSV file and the minimum requirements for creating a valid dataset, see the Preparing your training data guide for more details.

If you are new to Google Cloud, you may wonder what the gs:// URL means. They are URLs of files stored on Google Cloud Storage (GCS). If you make your files on GCS public or authenticate your client, Model Maker can read those files similarly to your local files.

However, you don't need to keep your images on Google Cloud to use Model Maker. You can use a local path in your CSV file and Model Maker will just work.

Quickstart

There are six steps to training an object detection model:

Step 1. Choose an object detection model archiecture.

This tutorial uses the EfficientDet-Lite0 model. EfficientDet-Lite[0-4] are a family of mobile/IoT-friendly object detection models derived from the EfficientDet architecture.

Here is the performance of each EfficientDet-Lite models compared to each others.

Model architecture Size(MB)* Latency(ms)** Average Precision***
EfficientDet-Lite0 4.4 37 25.69%
EfficientDet-Lite1 5.8 49 30.55%
EfficientDet-Lite2 7.2 69 33.97%
EfficientDet-Lite3 11.4 116 37.70%
EfficientDet-Lite4 19.9 260 41.96%

* Size of the integer quantized models.
** Latency measured on Pixel 4 using 4 threads on CPU.
*** Average Precision is the mAP (mean Average Precision) on the COCO 2017 validation dataset.

spec = model_spec.get('efficientdet_lite0')
2021-08-23 11:11:57.988938: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-23 11:11:59.422877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:11:59.423892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-23 11:11:59.423944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-23 11:12:01.129910: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-08-23 11:12:01.130052: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-08-23 11:12:02.236581: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-23 11:12:02.697693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-23 11:12:05.042619: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-08-23 11:12:06.392340: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-08-23 11:12:06.395250: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-23 11:12:06.395450: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:06.396486: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:06.397339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-08-23 11:12:06.398424: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-23 11:12:06.398989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:06.399925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-23 11:12:06.400039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:06.400943: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:06.401816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-08-23 11:12:06.401901: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-23 11:12:07.016833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-23 11:12:07.016871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-08-23 11:12:07.016880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-08-23 11:12:07.017125: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:07.018139: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:07.019289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:12:07.020206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)

Step 2. Load the dataset.

Model Maker will take input data in the CSV format. Use the object_detector.DataLoader.from_csv method to load the dataset and split them into the training, validation and test images.

  • Training images: These images are used to train the object detection model to recognize salad ingredients.
  • Validation images: These are images that the model didn't see during the training process. You'll use them to decide when you should stop the training, to avoid overfitting.
  • Test images: These images are used to evaluate the final model performance.

You can load the CSV file directly from Google Cloud Storage, but you don't need to keep your images on Google Cloud to use Model Maker. You can specify a local CSV file on your computer, and Model Maker will work just fine.

train_data, validation_data, test_data = object_detector.DataLoader.from_csv('gs://cloud-ml-data/img/openimage/csv/salads_ml_use.csv')

Step 3. Train the TensorFlow model with the training data.

  • The EfficientDet-Lite0 model uses epochs = 50 by default, which means it will go through the training dataset 50 times. You can look at the validation accuracy during training and stop early to avoid overfitting.
  • Set batch_size = 8 here so you will see that it takes 21 steps to go through the 175 images in the training dataset.
  • Set train_whole_model=True to fine-tune the whole model instead of just training the head layer to improve accuracy. The trade-off is that it may take longer to train the model.
model = object_detector.create(train_data, model_spec=spec, batch_size=8, train_whole_model=True, validation_data=validation_data)
2021-08-23 11:17:35.980903: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-23 11:17:35.996598: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000170000 Hz
Epoch 1/50
2021-08-23 11:18:01.076012: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-23 11:18:04.598759: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100
2021-08-23 11:18:10.208403: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-08-23 11:18:10.555085: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
21/21 [==============================] - 47s 560ms/step - det_loss: 1.7498 - cls_loss: 1.1207 - box_loss: 0.0126 - reg_l2_loss: 0.0635 - loss: 1.8133 - learning_rate: 0.0090 - gradient_norm: 0.7127 - val_det_loss: 1.6512 - val_cls_loss: 1.0942 - val_box_loss: 0.0111 - val_reg_l2_loss: 0.0635 - val_loss: 1.7148
Epoch 2/50
21/21 [==============================] - 9s 428ms/step - det_loss: 1.5749 - cls_loss: 1.0346 - box_loss: 0.0108 - reg_l2_loss: 0.0635 - loss: 1.6385 - learning_rate: 0.0100 - gradient_norm: 1.1286 - val_det_loss: 1.4898 - val_cls_loss: 0.9810 - val_box_loss: 0.0102 - val_reg_l2_loss: 0.0635 - val_loss: 1.5533
Epoch 3/50
21/21 [==============================] - 8s 399ms/step - det_loss: 1.3834 - cls_loss: 0.8835 - box_loss: 0.0100 - reg_l2_loss: 0.0635 - loss: 1.4470 - learning_rate: 0.0099 - gradient_norm: 1.6548 - val_det_loss: 1.5699 - val_cls_loss: 1.0917 - val_box_loss: 0.0096 - val_reg_l2_loss: 0.0636 - val_loss: 1.6335
Epoch 4/50
21/21 [==============================] - 11s 538ms/step - det_loss: 1.2581 - cls_loss: 0.7808 - box_loss: 0.0095 - reg_l2_loss: 0.0636 - loss: 1.3217 - learning_rate: 0.0099 - gradient_norm: 1.8710 - val_det_loss: 1.3915 - val_cls_loss: 0.9494 - val_box_loss: 0.0088 - val_reg_l2_loss: 0.0636 - val_loss: 1.4551
Epoch 5/50
21/21 [==============================] - 8s 402ms/step - det_loss: 1.1769 - cls_loss: 0.7312 - box_loss: 0.0089 - reg_l2_loss: 0.0636 - loss: 1.2405 - learning_rate: 0.0098 - gradient_norm: 1.8482 - val_det_loss: 1.2240 - val_cls_loss: 0.7897 - val_box_loss: 0.0087 - val_reg_l2_loss: 0.0636 - val_loss: 1.2875
Epoch 6/50
21/21 [==============================] - 8s 377ms/step - det_loss: 1.0785 - cls_loss: 0.6760 - box_loss: 0.0081 - reg_l2_loss: 0.0636 - loss: 1.1421 - learning_rate: 0.0097 - gradient_norm: 1.8346 - val_det_loss: 1.3397 - val_cls_loss: 0.9001 - val_box_loss: 0.0088 - val_reg_l2_loss: 0.0636 - val_loss: 1.4033
Epoch 7/50
21/21 [==============================] - 8s 404ms/step - det_loss: 1.0583 - cls_loss: 0.6720 - box_loss: 0.0077 - reg_l2_loss: 0.0636 - loss: 1.1219 - learning_rate: 0.0096 - gradient_norm: 1.8439 - val_det_loss: 1.2290 - val_cls_loss: 0.8042 - val_box_loss: 0.0085 - val_reg_l2_loss: 0.0636 - val_loss: 1.2926
Epoch 8/50
21/21 [==============================] - 8s 414ms/step - det_loss: 1.0155 - cls_loss: 0.6418 - box_loss: 0.0075 - reg_l2_loss: 0.0636 - loss: 1.0791 - learning_rate: 0.0094 - gradient_norm: 1.9218 - val_det_loss: 1.0643 - val_cls_loss: 0.6600 - val_box_loss: 0.0081 - val_reg_l2_loss: 0.0636 - val_loss: 1.1280
Epoch 9/50
21/21 [==============================] - 9s 427ms/step - det_loss: 0.9576 - cls_loss: 0.6097 - box_loss: 0.0070 - reg_l2_loss: 0.0636 - loss: 1.0212 - learning_rate: 0.0093 - gradient_norm: 1.7768 - val_det_loss: 1.2845 - val_cls_loss: 0.8958 - val_box_loss: 0.0078 - val_reg_l2_loss: 0.0636 - val_loss: 1.3481
Epoch 10/50
21/21 [==============================] - 9s 424ms/step - det_loss: 0.9062 - cls_loss: 0.5775 - box_loss: 0.0066 - reg_l2_loss: 0.0636 - loss: 0.9698 - learning_rate: 0.0091 - gradient_norm: 1.7403 - val_det_loss: 1.0913 - val_cls_loss: 0.7053 - val_box_loss: 0.0077 - val_reg_l2_loss: 0.0636 - val_loss: 1.1549
Epoch 11/50
21/21 [==============================] - 8s 369ms/step - det_loss: 0.9084 - cls_loss: 0.5879 - box_loss: 0.0064 - reg_l2_loss: 0.0636 - loss: 0.9720 - learning_rate: 0.0089 - gradient_norm: 2.0408 - val_det_loss: 1.0173 - val_cls_loss: 0.6315 - val_box_loss: 0.0077 - val_reg_l2_loss: 0.0636 - val_loss: 1.0810
Epoch 12/50
21/21 [==============================] - 8s 371ms/step - det_loss: 0.8794 - cls_loss: 0.5592 - box_loss: 0.0064 - reg_l2_loss: 0.0636 - loss: 0.9430 - learning_rate: 0.0087 - gradient_norm: 1.8005 - val_det_loss: 0.9659 - val_cls_loss: 0.5986 - val_box_loss: 0.0073 - val_reg_l2_loss: 0.0636 - val_loss: 1.0296
Epoch 13/50
21/21 [==============================] - 9s 459ms/step - det_loss: 0.8577 - cls_loss: 0.5512 - box_loss: 0.0061 - reg_l2_loss: 0.0636 - loss: 0.9214 - learning_rate: 0.0085 - gradient_norm: 2.0163 - val_det_loss: 0.9325 - val_cls_loss: 0.5910 - val_box_loss: 0.0068 - val_reg_l2_loss: 0.0636 - val_loss: 0.9961
Epoch 14/50
21/21 [==============================] - 9s 435ms/step - det_loss: 0.8478 - cls_loss: 0.5457 - box_loss: 0.0060 - reg_l2_loss: 0.0636 - loss: 0.9114 - learning_rate: 0.0082 - gradient_norm: 2.0608 - val_det_loss: 0.9593 - val_cls_loss: 0.6255 - val_box_loss: 0.0067 - val_reg_l2_loss: 0.0636 - val_loss: 1.0229
Epoch 15/50
21/21 [==============================] - 9s 441ms/step - det_loss: 0.8186 - cls_loss: 0.5299 - box_loss: 0.0058 - reg_l2_loss: 0.0637 - loss: 0.8823 - learning_rate: 0.0080 - gradient_norm: 2.0593 - val_det_loss: 0.9132 - val_cls_loss: 0.5886 - val_box_loss: 0.0065 - val_reg_l2_loss: 0.0637 - val_loss: 0.9769
Epoch 16/50
21/21 [==============================] - 8s 387ms/step - det_loss: 0.8167 - cls_loss: 0.5315 - box_loss: 0.0057 - reg_l2_loss: 0.0637 - loss: 0.8804 - learning_rate: 0.0077 - gradient_norm: 2.0913 - val_det_loss: 0.9579 - val_cls_loss: 0.6151 - val_box_loss: 0.0069 - val_reg_l2_loss: 0.0637 - val_loss: 1.0215
Epoch 17/50
21/21 [==============================] - 9s 455ms/step - det_loss: 0.8362 - cls_loss: 0.5329 - box_loss: 0.0061 - reg_l2_loss: 0.0637 - loss: 0.8999 - learning_rate: 0.0075 - gradient_norm: 2.2907 - val_det_loss: 0.9857 - val_cls_loss: 0.6500 - val_box_loss: 0.0067 - val_reg_l2_loss: 0.0637 - val_loss: 1.0493
Epoch 18/50
21/21 [==============================] - 10s 512ms/step - det_loss: 0.7982 - cls_loss: 0.5097 - box_loss: 0.0058 - reg_l2_loss: 0.0637 - loss: 0.8618 - learning_rate: 0.0072 - gradient_norm: 2.0724 - val_det_loss: 0.9486 - val_cls_loss: 0.6189 - val_box_loss: 0.0066 - val_reg_l2_loss: 0.0637 - val_loss: 1.0123
Epoch 19/50
21/21 [==============================] - 9s 426ms/step - det_loss: 0.7971 - cls_loss: 0.5132 - box_loss: 0.0057 - reg_l2_loss: 0.0637 - loss: 0.8608 - learning_rate: 0.0069 - gradient_norm: 2.1317 - val_det_loss: 0.8993 - val_cls_loss: 0.5856 - val_box_loss: 0.0063 - val_reg_l2_loss: 0.0637 - val_loss: 0.9629
Epoch 20/50
21/21 [==============================] - 8s 384ms/step - det_loss: 0.7721 - cls_loss: 0.5056 - box_loss: 0.0053 - reg_l2_loss: 0.0637 - loss: 0.8357 - learning_rate: 0.0066 - gradient_norm: 2.3111 - val_det_loss: 0.9005 - val_cls_loss: 0.5803 - val_box_loss: 0.0064 - val_reg_l2_loss: 0.0637 - val_loss: 0.9642
Epoch 21/50
21/21 [==============================] - 8s 407ms/step - det_loss: 0.7479 - cls_loss: 0.4783 - box_loss: 0.0054 - reg_l2_loss: 0.0637 - loss: 0.8116 - learning_rate: 0.0063 - gradient_norm: 2.1176 - val_det_loss: 0.8907 - val_cls_loss: 0.5672 - val_box_loss: 0.0065 - val_reg_l2_loss: 0.0637 - val_loss: 0.9544
Epoch 22/50
21/21 [==============================] - 9s 421ms/step - det_loss: 0.7528 - cls_loss: 0.4944 - box_loss: 0.0052 - reg_l2_loss: 0.0637 - loss: 0.8165 - learning_rate: 0.0060 - gradient_norm: 2.2042 - val_det_loss: 0.9323 - val_cls_loss: 0.5999 - val_box_loss: 0.0066 - val_reg_l2_loss: 0.0637 - val_loss: 0.9960
Epoch 23/50
21/21 [==============================] - 8s 411ms/step - det_loss: 0.7228 - cls_loss: 0.4738 - box_loss: 0.0050 - reg_l2_loss: 0.0637 - loss: 0.7865 - learning_rate: 0.0056 - gradient_norm: 2.2088 - val_det_loss: 0.9609 - val_cls_loss: 0.6336 - val_box_loss: 0.0065 - val_reg_l2_loss: 0.0637 - val_loss: 1.0246
Epoch 24/50
21/21 [==============================] - 8s 394ms/step - det_loss: 0.7517 - cls_loss: 0.4870 - box_loss: 0.0053 - reg_l2_loss: 0.0637 - loss: 0.8154 - learning_rate: 0.0053 - gradient_norm: 2.2922 - val_det_loss: 0.8865 - val_cls_loss: 0.5759 - val_box_loss: 0.0062 - val_reg_l2_loss: 0.0637 - val_loss: 0.9502
Epoch 25/50
21/21 [==============================] - 9s 445ms/step - det_loss: 0.7264 - cls_loss: 0.4772 - box_loss: 0.0050 - reg_l2_loss: 0.0637 - loss: 0.7901 - learning_rate: 0.0050 - gradient_norm: 2.4224 - val_det_loss: 0.8677 - val_cls_loss: 0.5584 - val_box_loss: 0.0062 - val_reg_l2_loss: 0.0637 - val_loss: 0.9314
Epoch 26/50
21/21 [==============================] - 8s 368ms/step - det_loss: 0.7313 - cls_loss: 0.4784 - box_loss: 0.0051 - reg_l2_loss: 0.0637 - loss: 0.7950 - learning_rate: 0.0047 - gradient_norm: 2.3990 - val_det_loss: 0.8848 - val_cls_loss: 0.5795 - val_box_loss: 0.0061 - val_reg_l2_loss: 0.0637 - val_loss: 0.9485
Epoch 27/50
21/21 [==============================] - 9s 438ms/step - det_loss: 0.7442 - cls_loss: 0.4808 - box_loss: 0.0053 - reg_l2_loss: 0.0637 - loss: 0.8079 - learning_rate: 0.0044 - gradient_norm: 2.2786 - val_det_loss: 0.8670 - val_cls_loss: 0.5623 - val_box_loss: 0.0061 - val_reg_l2_loss: 0.0637 - val_loss: 0.9307
Epoch 28/50
21/21 [==============================] - 7s 365ms/step - det_loss: 0.7101 - cls_loss: 0.4650 - box_loss: 0.0049 - reg_l2_loss: 0.0637 - loss: 0.7738 - learning_rate: 0.0040 - gradient_norm: 2.3402 - val_det_loss: 0.9498 - val_cls_loss: 0.6235 - val_box_loss: 0.0065 - val_reg_l2_loss: 0.0637 - val_loss: 1.0135
Epoch 29/50
21/21 [==============================] - 8s 413ms/step - det_loss: 0.7008 - cls_loss: 0.4541 - box_loss: 0.0049 - reg_l2_loss: 0.0637 - loss: 0.7646 - learning_rate: 0.0037 - gradient_norm: 2.4113 - val_det_loss: 0.9092 - val_cls_loss: 0.6099 - val_box_loss: 0.0060 - val_reg_l2_loss: 0.0637 - val_loss: 0.9729
Epoch 30/50
21/21 [==============================] - 8s 374ms/step - det_loss: 0.6828 - cls_loss: 0.4551 - box_loss: 0.0046 - reg_l2_loss: 0.0637 - loss: 0.7465 - learning_rate: 0.0034 - gradient_norm: 2.2994 - val_det_loss: 0.9009 - val_cls_loss: 0.6075 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9646
Epoch 31/50
21/21 [==============================] - 8s 396ms/step - det_loss: 0.7034 - cls_loss: 0.4578 - box_loss: 0.0049 - reg_l2_loss: 0.0637 - loss: 0.7671 - learning_rate: 0.0031 - gradient_norm: 2.5569 - val_det_loss: 0.8762 - val_cls_loss: 0.5866 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9399
Epoch 32/50
21/21 [==============================] - 8s 405ms/step - det_loss: 0.6809 - cls_loss: 0.4430 - box_loss: 0.0048 - reg_l2_loss: 0.0637 - loss: 0.7446 - learning_rate: 0.0028 - gradient_norm: 2.2210 - val_det_loss: 0.8479 - val_cls_loss: 0.5602 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9116
Epoch 33/50
21/21 [==============================] - 11s 547ms/step - det_loss: 0.6803 - cls_loss: 0.4439 - box_loss: 0.0047 - reg_l2_loss: 0.0637 - loss: 0.7440 - learning_rate: 0.0025 - gradient_norm: 2.3671 - val_det_loss: 0.8496 - val_cls_loss: 0.5612 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9133
Epoch 34/50
21/21 [==============================] - 7s 359ms/step - det_loss: 0.6711 - cls_loss: 0.4442 - box_loss: 0.0045 - reg_l2_loss: 0.0637 - loss: 0.7348 - learning_rate: 0.0023 - gradient_norm: 2.3403 - val_det_loss: 0.8600 - val_cls_loss: 0.5694 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9237
Epoch 35/50
21/21 [==============================] - 8s 357ms/step - det_loss: 0.7048 - cls_loss: 0.4478 - box_loss: 0.0051 - reg_l2_loss: 0.0637 - loss: 0.7685 - learning_rate: 0.0020 - gradient_norm: 2.4493 - val_det_loss: 0.8554 - val_cls_loss: 0.5704 - val_box_loss: 0.0057 - val_reg_l2_loss: 0.0637 - val_loss: 0.9191
Epoch 36/50
21/21 [==============================] - 8s 399ms/step - det_loss: 0.6647 - cls_loss: 0.4287 - box_loss: 0.0047 - reg_l2_loss: 0.0637 - loss: 0.7284 - learning_rate: 0.0018 - gradient_norm: 2.2872 - val_det_loss: 0.8561 - val_cls_loss: 0.5644 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9198
Epoch 37/50
21/21 [==============================] - 8s 415ms/step - det_loss: 0.6642 - cls_loss: 0.4333 - box_loss: 0.0046 - reg_l2_loss: 0.0637 - loss: 0.7279 - learning_rate: 0.0015 - gradient_norm: 2.3742 - val_det_loss: 0.8689 - val_cls_loss: 0.5745 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9326
Epoch 38/50
21/21 [==============================] - 9s 421ms/step - det_loss: 0.6635 - cls_loss: 0.4372 - box_loss: 0.0045 - reg_l2_loss: 0.0637 - loss: 0.7272 - learning_rate: 0.0013 - gradient_norm: 2.4729 - val_det_loss: 0.8648 - val_cls_loss: 0.5697 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9285
Epoch 39/50
21/21 [==============================] - 8s 381ms/step - det_loss: 0.6555 - cls_loss: 0.4249 - box_loss: 0.0046 - reg_l2_loss: 0.0637 - loss: 0.7192 - learning_rate: 0.0011 - gradient_norm: 2.3844 - val_det_loss: 0.8520 - val_cls_loss: 0.5581 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9157
Epoch 40/50
21/21 [==============================] - 9s 426ms/step - det_loss: 0.6551 - cls_loss: 0.4192 - box_loss: 0.0047 - reg_l2_loss: 0.0637 - loss: 0.7188 - learning_rate: 9.0029e-04 - gradient_norm: 2.2808 - val_det_loss: 0.8558 - val_cls_loss: 0.5631 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9195
Epoch 41/50
21/21 [==============================] - 8s 395ms/step - det_loss: 0.6549 - cls_loss: 0.4337 - box_loss: 0.0044 - reg_l2_loss: 0.0637 - loss: 0.7186 - learning_rate: 7.2543e-04 - gradient_norm: 2.4308 - val_det_loss: 0.8449 - val_cls_loss: 0.5519 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9086
Epoch 42/50
21/21 [==============================] - 11s 561ms/step - det_loss: 0.6683 - cls_loss: 0.4417 - box_loss: 0.0045 - reg_l2_loss: 0.0637 - loss: 0.7320 - learning_rate: 5.6814e-04 - gradient_norm: 2.3405 - val_det_loss: 0.8425 - val_cls_loss: 0.5496 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9062
Epoch 43/50
21/21 [==============================] - 8s 390ms/step - det_loss: 0.6756 - cls_loss: 0.4431 - box_loss: 0.0047 - reg_l2_loss: 0.0637 - loss: 0.7393 - learning_rate: 4.2906e-04 - gradient_norm: 2.6181 - val_det_loss: 0.8418 - val_cls_loss: 0.5496 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9055
Epoch 44/50
21/21 [==============================] - 8s 414ms/step - det_loss: 0.6391 - cls_loss: 0.4152 - box_loss: 0.0045 - reg_l2_loss: 0.0637 - loss: 0.7028 - learning_rate: 3.0876e-04 - gradient_norm: 2.3177 - val_det_loss: 0.8376 - val_cls_loss: 0.5452 - val_box_loss: 0.0058 - val_reg_l2_loss: 0.0637 - val_loss: 0.9013
Epoch 45/50
21/21 [==============================] - 9s 436ms/step - det_loss: 0.6522 - cls_loss: 0.4257 - box_loss: 0.0045 - reg_l2_loss: 0.0637 - loss: 0.7159 - learning_rate: 2.0774e-04 - gradient_norm: 2.3584 - val_det_loss: 0.8406 - val_cls_loss: 0.5476 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9043
Epoch 46/50
21/21 [==============================] - 8s 384ms/step - det_loss: 0.6506 - cls_loss: 0.4302 - box_loss: 0.0044 - reg_l2_loss: 0.0637 - loss: 0.7143 - learning_rate: 1.2641e-04 - gradient_norm: 2.3207 - val_det_loss: 0.8422 - val_cls_loss: 0.5480 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9059
Epoch 47/50
21/21 [==============================] - 9s 433ms/step - det_loss: 0.6298 - cls_loss: 0.4192 - box_loss: 0.0042 - reg_l2_loss: 0.0637 - loss: 0.6935 - learning_rate: 6.5107e-05 - gradient_norm: 2.3190 - val_det_loss: 0.8389 - val_cls_loss: 0.5457 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9026
Epoch 48/50
21/21 [==============================] - 9s 430ms/step - det_loss: 0.6452 - cls_loss: 0.4232 - box_loss: 0.0044 - reg_l2_loss: 0.0637 - loss: 0.7089 - learning_rate: 2.4083e-05 - gradient_norm: 2.4217 - val_det_loss: 0.8381 - val_cls_loss: 0.5451 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9018
Epoch 49/50
21/21 [==============================] - 8s 417ms/step - det_loss: 0.6723 - cls_loss: 0.4453 - box_loss: 0.0045 - reg_l2_loss: 0.0637 - loss: 0.7360 - learning_rate: 3.5074e-06 - gradient_norm: 2.3262 - val_det_loss: 0.8362 - val_cls_loss: 0.5435 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.8999
Epoch 50/50
21/21 [==============================] - 8s 409ms/step - det_loss: 0.6715 - cls_loss: 0.4342 - box_loss: 0.0047 - reg_l2_loss: 0.0637 - loss: 0.7352 - learning_rate: 3.4629e-06 - gradient_norm: 2.5055 - val_det_loss: 0.8370 - val_cls_loss: 0.5437 - val_box_loss: 0.0059 - val_reg_l2_loss: 0.0637 - val_loss: 0.9007

Step 4. Evaluate the model with the test data.

After training the object detection model using the images in the training dataset, use the remaining 25 images in the test dataset to evaluate how the model performs against new data it has never seen before.

As the default batch size is 64, it will take 1 step to go through the 25 images in the test dataset.

The evaluation metrics are same as COCO.

model.evaluate(test_data)
1/1 [==============================] - 6s 6s/step
{'AP': 0.20494653,
 'AP50': 0.36424518,
 'AP75': 0.21550791,
 'APs': -1.0,
 'APm': 0.4068788,
 'APl': 0.20514555,
 'ARmax1': 0.16141042,
 'ARmax10': 0.3395588,
 'ARmax100': 0.3877061,
 'ARs': -1.0,
 'ARm': 0.55833334,
 'ARl': 0.3856298,
 'AP_/Baked Goods': 0.06853136,
 'AP_/Salad': 0.50613815,
 'AP_/Cheese': 0.186581,
 'AP_/Seafood': 0.008855937,
 'AP_/Tomato': 0.25462624}

Step 5. Export as a TensorFlow Lite model.

Export the trained object detection model to the TensorFlow Lite format by specifying which folder you want to export the quantized model to. The default post-training quantization technique is full integer quantization.

model.export(export_dir='.')
2021-08-23 11:26:47.666277: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-23 11:27:09.287682: W tensorflow/core/common_runtime/graph_constructor.cc:809] Node 'resample_p7/PartitionedCall' has 1 outputs but the _output_shapes attribute specifies shapes for 3 outputs. Output shapes may be inaccurate.
2021-08-23 11:27:17.129588: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:345] Ignored output_format.
2021-08-23 11:27:17.129646: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:348] Ignored drop_control_dependency.
2021-08-23 11:27:17.129653: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:354] Ignored change_concat_input_ranges.
2021-08-23 11:27:17.130683: I tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /tmp/tmp8rosxezi
2021-08-23 11:27:17.225788: I tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2021-08-23 11:27:17.225828: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /tmp/tmp8rosxezi
2021-08-23 11:27:17.225901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-23 11:27:17.225908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
2021-08-23 11:27:17.720909: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-08-23 11:27:19.274493: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /tmp/tmp8rosxezi
2021-08-23 11:27:19.706156: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 2575477 microseconds.
2021-08-23 11:27:21.161632: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2021-08-23 11:27:22.647277: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:27:22.647744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:05.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-23 11:27:22.647952: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:27:22.648292: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:27:22.648533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-08-23 11:27:22.648575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-23 11:27:22.648582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2021-08-23 11:27:22.648588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2021-08-23 11:27:22.648681: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:27:22.648981: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 11:27:22.649235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14646 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:05.0, compute capability: 7.0)
fully_quantize: 0, inference_type: 6, input_inference_type: 3, output_inference_type: 0

Step 6. Evaluate the TensorFlow Lite model.

Several factors can affect the model accuracy when exporting to TFLite:

  • Quantization helps shrinking the model size by 4 times at the expense of some accuracy drop.
  • The original TensorFlow model uses per-class non-max supression (NMS) for post-processing, while the TFLite model uses global NMS that's much faster but less accurate. Keras outputs maximum 100 detections while tflite outputs maximum 25 detections.

Therefore you'll have to evaluate the exported TFLite model and compare its accuracy with the original TensorFlow model.

model.evaluate_tflite('model.tflite', test_data)
25/25 [==============================] - 60s 2s/step
{'AP': 0.17693739,
 'AP50': 0.30656606,
 'AP75': 0.20315194,
 'APs': -1.0,
 'APm': 0.45712852,
 'APl': 0.17589222,
 'ARmax1': 0.125332,
 'ARmax10': 0.23722705,
 'ARmax100': 0.25630012,
 'ARs': -1.0,
 'ARm': 0.55833334,
 'ARl': 0.2533212,
 'AP_/Baked Goods': 0.0,
 'AP_/Salad': 0.48429495,
 'AP_/Cheese': 0.17168286,
 'AP_/Seafood': 0.00038080732,
 'AP_/Tomato': 0.22832833}

You can download the TensorFlow Lite model file using the left sidebar of Colab. Right-click on the model.tflite file and choose Download to download it to your local computer.

This model can be integrated into an Android or an iOS app using the ObjectDetector API of the TensorFlow Lite Task Library.

See the TFLite Object Detection sample app for more details on how the model is used in an working app.

(Optional) Test the TFLite model on your image

You can test the trained TFLite model using images from the internet.

  • Replace the INPUT_IMAGE_URL below with your desired input image.
  • Adjust the DETECTION_THRESHOLD to change the sensitivity of the model. A lower threshold means the model will pickup more objects but there will also be more false detection. Meanwhile, a higher threshold means the model will only pickup objects that it has confidently detected.

Although it requires some of boilerplate code to run the model in Python at this moment, integrating the model into a mobile app only requires a few lines of code.

Load the trained TFLite model and define some visualization functions

Run object detection and show the detection results

png

(Optional) Compile For the Edge TPU

Now that you have a quantized EfficientDet Lite model, it is possible to compile and deploy to a Coral EdgeTPU.

Step 1. Install the EdgeTPU Compiler

 curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

 echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

 sudo apt-get update

 sudo apt-get install edgetpu-compiler
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2537  100  2537    0     0  90607      0 --:--:-- --:--:-- --:--:-- 90607
OK
deb https://packages.cloud.google.com/apt coral-edgetpu-stable main
Get:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease [1484 B]
Get:2 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1481 B]
Hit:3 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease
Ign:4 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:5 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:6 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic InRelease
Hit:7 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:8 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-backports InRelease
Get:9 https://packages.cloud.google.com/apt coral-edgetpu-stable InRelease [6722 B]
Get:10 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:12 http://packages.cloud.google.com/apt google-cloud-logging-wheezy InRelease [5483 B]
Hit:13 http://archive.canonical.com/ubuntu bionic InRelease
Get:14 https://packages.cloud.google.com/apt eip-cloud-bionic InRelease [5419 B]
Ign:15 https://packages.cloud.google.com/apt coral-edgetpu-stable/main amd64 Packages
Get:15 https://packages.cloud.google.com/apt coral-edgetpu-stable/main amd64 Packages [2327 B]
Fetched 112 kB in 1s (104 kB/s)




The following packages were automatically installed and are no longer required:
  linux-gcp-5.4-headers-5.4.0-1040 linux-gcp-5.4-headers-5.4.0-1043
  linux-gcp-5.4-headers-5.4.0-1044 linux-headers-5.4.0-1043-gcp
  linux-headers-5.4.0-1044-gcp linux-image-5.4.0-1044-gcp
  linux-modules-5.4.0-1044-gcp linux-modules-extra-5.4.0-1044-gcp
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  edgetpu-compiler
0 upgraded, 1 newly installed, 0 to remove and 114 not upgraded.
Need to get 7913 kB of archives.
After this operation, 31.2 MB of additional disk space will be used.
Get:1 https://packages.cloud.google.com/apt coral-edgetpu-stable/main amd64 edgetpu-compiler amd64 16.0 [7913 kB]
Fetched 7913 kB in 0s (27.9 MB/s)
Selecting previously unselected package edgetpu-compiler.
(Reading database ... 275858 files and directories currently installed.)
Preparing to unpack .../edgetpu-compiler_16.0_amd64.deb ...
Unpacking edgetpu-compiler (16.0) ...
Setting up edgetpu-compiler (16.0) ...
Processing triggers for libc-bin (2.27-3ubuntu1.2) ...

Step 2. Select number of Edge TPUs, Compile

The EdgeTPU has 8MB of SRAM for caching model paramaters (more info). This means that for models that are larger than 8MB, inference time will be increased in order to transfer over model paramaters. One way to avoid this is Model Pipelining - splitting the model into segments that can have a dedicated EdgeTPU. This can significantly improve latency.

The below table can be used as a reference for the number of Edge TPUs to use - the larger models will not compile for a single TPU as the intermediate tensors can't fit in on-chip memory.

Model architecture Minimum TPUs Recommended TPUs
EfficientDet-Lite0 1 1
EfficientDet-Lite1 1 1
EfficientDet-Lite2 1 2
EfficientDet-Lite3 2 2
EfficientDet-Lite4 2 3

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 4241 ms.

Input model: model.tflite
Input size: 4.22MiB
Output model: model_edgetpu.tflite
Output size: 5.61MiB
On-chip memory used for caching model parameters: 4.24MiB
On-chip memory remaining for caching model parameters: 3.27MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 267
Operation log: model_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 264
Number of operations that will run on CPU: 3
See the operation log file for individual operation details.
Compilation child process completed within timeout period.
Compilation succeeded!

Step 3. Download, Run Model

With the model(s) compiled, they can now be run on EdgeTPU(s) for object detection. First, download the compiled TensorFlow Lite model file using the left sidebar of Colab. Right-click on the model_edgetpu.tflite file and choose Download to download it to your local computer.

Now you can run the model in your preferred manner. Examples of detection include:

Advanced Usage

This section covers advanced usage topics like adjusting the model and the training hyperparameters.

Load the dataset

Load your own data

You can upload your own dataset to work through this tutorial. Upload your dataset by using the left sidebar in Colab.

Upload File

If you prefer not to upload your dataset to the cloud, you can also locally run the library by following the guide.

Load your data with a different data format

The Model Maker library also supports the object_detector.DataLoader.from_pascal_voc method to load data with PASCAL VOC format. makesense.ai and LabelImg are the tools that can annotate the image and save annotations as XML files in PASCAL VOC data format:

object_detector.DataLoader.from_pascal_voc(image_dir, annotations_dir, label_map={1: "person", 2: "notperson"})

Customize the EfficientDet model hyperparameters

The model and training pipline parameters you can adjust are:

  • model_dir: The location to save the model checkpoint files. If not set, a temporary directory will be used.
  • steps_per_execution: Number of steps per training execution.
  • moving_average_decay: Float. The decay to use for maintaining moving averages of the trained parameters.
  • var_freeze_expr: The regular expression to map the prefix name of variables to be frozen which means remaining the same during training. More specific, use re.match(var_freeze_expr, variable_name) in the codebase to map the variables to be frozen.
  • tflite_max_detections: integer, 25 by default. The max number of output detections in the TFLite model.
  • strategy: A string specifying which distribution strategy to use. Accepted values are 'tpu', 'gpus', None. tpu' means to use TPUStrategy. 'gpus' mean to use MirroredStrategy for multi-gpus. If None, use TF default with OneDeviceStrategy.
  • tpu: The Cloud TPU to use for training. This should be either the name used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 url.
  • use_xla: Use XLA even if strategy is not tpu. If strategy is tpu, always use XLA, and this flag has no effect.
  • profile: Enable profile mode.
  • debug: Enable debug mode.

Other parameters that can be adjusted is shown in hparams_config.py.

For instance, you can set the var_freeze_expr='efficientnet' which freezes the variables with name prefix efficientnet (default is '(efficientnet|fpn_cells|resample_p6)'). This allows the model to freeze untrainable variables and keep their value the same through training.

spec = model_spec.get('efficientdet-lite0')
spec.config.var_freeze_expr = 'efficientnet'

Change the Model Architecture

You can change the model architecture by changing the model_spec. For instance, change the model_spec to the EfficientDet-Lite4 model.

spec = model_spec.get('efficientdet-lite4')

Tune the training hyperparameters

The create function is the driver function that the Model Maker library uses to create models. The model_spec parameter defines the model specification. The object_detector.EfficientDetSpec class is currently supported. The create function comprises of the following steps:

  1. Creates the model for the object detection according to model_spec.
  2. Trains the model. The default epochs and the default batch size are set by the epochs and batch_size variables in the model_spec object. You can also tune the training hyperparameters like epochs and batch_size that affect the model accuracy. For instance,
  • epochs: Integer, 50 by default. More epochs could achieve better accuracy, but may lead to overfitting.
  • batch_size: Integer, 64 by default. The number of samples to use in one training step.
  • train_whole_model: Boolean, False by default. If true, train the whole model. Otherwise, only train the layers that do not match var_freeze_expr.

For example, you can train with less epochs and only the head layer. You can increase the number of epochs for better results.

model = object_detector.create(train_data, model_spec=spec, epochs=10, validation_data=validation_data)

Export to different formats

The export formats can be one or a list of the following:

By default, it exports only the TensorFlow Lite model file containing the model metadata so that you can later use in an on-device ML application. The label file is embedded in metadata.

In many on-device ML application, the model size is an important factor. Therefore, it is recommended that you quantize the model to make it smaller and potentially run faster. As for EfficientDet-Lite models, full integer quantization is used to quantize the model by default. Please refer to Post-training quantization for more detail.

model.export(export_dir='.')

You can also choose to export other files related to the model for better examination. For instance, exporting both the saved model and the label file as follows:

model.export(export_dir='.', export_format=[ExportFormat.SAVED_MODEL, ExportFormat.LABEL])

Customize Post-training quantization on the TensorFlow Lite model

Post-training quantization is a conversion technique that can reduce model size and inference latency, while also improving CPU and hardware accelerator inference speed, with a little degradation in model accuracy. Thus, it's widely used to optimize the model.

Model Maker library applies a default post-training quantization techique when exporting the model. If you want to customize post-training quantization, Model Maker supports multiple post-training quantization options using QuantizationConfig as well. Let's take float16 quantization as an instance. First, define the quantization config.

config = QuantizationConfig.for_float16()

Then we export the TensorFlow Lite model with such configuration.

model.export(export_dir='.', tflite_filename='model_fp16.tflite', quantization_config=config)

Read more

You can read our object detection example to learn technical details. For more information, please refer to: