Deploy machine learning models on mobile and IoT devices
TensorFlow Lite is an open source deep learning framework for on-device inference.
How it works
Convert
Convert a TensorFlow model into a compressed flat buffer with the TensorFlow Lite Converter.
Deploy
Take the compressed .tflite file and load it into a mobile or embedded device.
Optimize
Quantize by converting 32-bit floats to more efficient 8-bit integers or run on GPU.
Solutions to common problems
Explore optimized models to help with common mobile and edge use cases.

Identify hundreds of objects, including people, activities, animals, plants, and places.


A few of our TensorFlow Lite users
News & announcements
See updates to help you with your work, and subscribe to our monthly TensorFlow newsletter to get the latest announcements sent directly to your inbox.

Integer quantization is a new addition to the TensorFlow Model Optimization Toolkit. It is a general technique that reduces the numerical precision of the weights and activations of models to reduce memory and improve latency.

Weight pruning, a new addition to the TensorFlow Model Optimization toolkit, aims to reduce the number of parameters and operations involved in the computation by removing connections, and thus parameters, in between neural network layers.

In this video, you'll learn how to build AI into any device using TensorFlow Lite, and learn about the future of on-device ML and our roadmap. You’ll also discover a library of pretrained models that are ready to use in your apps or to be customized for your needs.

Run inference on GPU can improve inference up to ~4x on Pixel 3.