Segmentation

Image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as image objects). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.

The following image shows the output of the image segmentation model on Android. The model will create a mask over the target objects with high accuracy.

Get started

If you are new to TensorFlow Lite and are working with Android or iOS, it is recommended you explore the following example applications that can help you get started.

You can leverage the out-of-box API from TensorFlow Lite Task Library to integrate image segmentation models within just a few lines of code. You can also integrate the model using the TensorFlow Lite Interpreter Java API.

The Android example below demonstrates the implementation for both methods as lib_task_api and lib_interpreter, respectively.

View Android example

View iOS example

If you are using a platform other than Android or iOS, or you are already familiar with the TensorFlow Lite APIs, you can download our starter image segmentation model.

Download starter model

Model description

DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g. person, dog, cat) to every pixel in the input image.

How it works

Semantic image segmentation predicts whether each pixel of an image is associated with a certain class. This is in contrast to object detection, which detects objects in rectangular regions, and image classification, which classifies the overall image.

The current implementation includes the following features:

  1. DeepLabv1: We use atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks.
  2. DeepLabv2: We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views.
  3. DeepLabv3: We augment the ASPP module with image-level feature [5, 6] to capture longer range information. We also include batch normalization [7] parameters to facilitate the training. In particular, we applying atrous convolution to extract output features at different output strides during training and evaluation, which efficiently enables training BN at output stride = 16 and attains a high performance at output stride = 8 during evaluation.
  4. DeepLabv3+: We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder-decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime.

Performance benchmarks

Performance benchmark numbers are generated with the tool described here.

Model Name Model size Device GPU CPU
Deeplab v3 2.7 Mb Pixel 3 (Android 10) 16ms 37ms*
Pixel 4 (Android 10) 20ms 23ms*
iPhone XS (iOS 12.4.1) 16ms 25ms**

* 4 threads used.

** 2 threads used on iPhone for the best performance result.

Further reading and resources