TensorFlow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded, and IoT devices.
- Optimized for on-device machine learning, by addressing 5 key constraints: latency (there's no round-trip to a server), privacy (no personal data leaves the device), connectivity (internet connectivity is not required), size (reduced model and binary size) and power consumption (efficient inference and a lack of network connections).
- Multiple platform support, covering Android and iOS devices, embedded Linux, and microcontrollers.
- Diverse language support, which includes Java, Swift, Objective-C, C++, and Python.
- High performance, with hardware acceleration and model optimization.
- End-to-end examples, for common machine learning tasks such as image classification, object detection, pose estimation, question answering, text classification, etc. on multiple platforms.
The following guide walks through each step of the workflow and provides links to further instructions:
1. Generate a TensorFlow Lite model
A TensorFlow Lite model is represented in a special efficient portable format known as FlatBuffers (identified by the .tflite file extension). This provides several advantages over TensorFlow's protocol buffer model format such as reduced size (small code footprint) and faster inference (data is directly accessed without an extra parsing/unpacking step) that enables TensorFlow Lite to execute efficiently on devices with limited compute and memory resources.
A TensorFlow Lite model can optionally include metadata that has human-readable model description and machine-readable data for automatic generation of pre- and post-processing pipelines during on-device inference. Refer to Add metadata for more details.
You can generate a TensorFlow Lite model in the following ways:
Use an existing TensorFlow Lite model: Refer to TensorFlow Lite Examples to pick an existing model. Models may or may not contain metadata.
Create a TensorFlow Lite model: Use the TensorFlow Lite Model Maker to create a model with your own custom dataset. By default, all models contain metadata.
Convert a TensorFlow model into a TensorFlow Lite model: Use the TensorFlow Lite Converter to convert a TensorFlow model into a TensorFlow Lite model. During conversion, you can apply optimizations such as quantization to reduce model size and latency with minimal or no loss in accuracy. By default, all models don't contain metadata.
2. Run Inference
Inference refers to the process of executing a TensorFlow Lite model on-device to make predictions based on input data. You can run inference in the following ways based on the model type:
Models without metadata: Use the TensorFlow Lite Interpreter API. Supported on multiple platforms and languages such as Java, Swift, C++, Objective-C and Python.
Models with metadata: You can either leverage the out-of-box APIs using the TensorFlow Lite Task Library or build custom inference pipelines with the TensorFlow Lite Support Library. On android devices, users can automatically generate code wrappers using the Android Studio ML Model Binding or the TensorFlow Lite Code Generator. Supported only on Java (Android) while Swift (iOS) and C++ is work in progress.
On Android and iOS devices, you can improve performance using hardware acceleration. On either platforms you can use a GPU Delegate, on android you can either use the NNAPI Delegate (for newer devices) or the Hexagon Delegate (on older devices) and on iOS you can use the Core ML Delegate. To add support for new hardware accelerators, you can define your own delegate.
You can refer to the following guides based on your target device:
Microcontrollers: Explore the TensorFlow Lite for Microcontrollers library for microcontrollers and DSPs that contain only a few kilobytes of memory.