TensorFlow Lite is an open source deep learning framework for on-device inference.
Pick a new model or retrain an existing one.
Convert a TensorFlow model into a compressed flat buffer with the TensorFlow Lite Converter.
Take the compressed .tflite file and load it into a mobile or embedded device.
Quantize by converting 32-bit floats to more efficient 8-bit integers or run on GPU.
Explore optimized models to help with common mobile and edge use cases.
See more ways to participate in the TensorFlow community.
See updates to help you with your work, and subscribe to our monthly TensorFlow newsletter to get the latest announcements sent directly to your inbox.
Integer quantization is a new addition to the TensorFlow Model Optimization Toolkit. It is a general technique that reduces the numerical precision of the weights and activations of models to reduce memory and improve latency.
Weight pruning, a new addition to the TensorFlow Model Optimization toolkit, aims to reduce the number of parameters and operations involved in the computation by removing connections, and thus parameters, in between neural network layers.
In this video, you'll learn how to build AI into any device using TensorFlow Lite, and learn about the future of on-device ML and our roadmap. You’ll also discover a library of pretrained models that are ready to use in your apps or to be customized for your needs.
Run inference on GPU can improve inference up to ~4x on Pixel 3.