Optical character recognition (OCR) is the process of recognizing characters from images using computer vision and machine learning techniques. This reference app demos how to use TensorFlow Lite to do OCR. It uses a combination of text detection model and a text recognition model as an OCR pipeline to recognize text characters.
If you are new to TensorFlow Lite and are working with Android, we recommend exploring the following example application that can help you get started.
How it works
OCR tasks are often broken down into 2 stages. First, we use a text detection model to detect the bounding boxes around possible texts. Second, we feed processed bounding boxes into a text recognition model to determine specific characters inside the bounding boxes (we also need to do Non-Maximal Suppression, perspective transformation and etc. beforing text recognition). In our case, both models are from TensorFlow Hub and they are FP16 quantized models.
Performance benchmark numbers are generated with the tool described here.
|Pixel 4 (Android 10)
|Pixel 4 (Android 10)
* 4 threads used.
** this model could not use GPU delegate since we need TensorFlow ops to run it
The text detection model accepts a 4-D
float32 Tensor of (1, 320, 320, 3) as
The text recognition model accepts a 4-D
float32 Tensor of (1, 31, 200, 1) as
The text detection model returns a 4-D
float32 Tensor of shape (1, 80, 80, 5)
as bounding box and a 4-D
float32 Tensor of shape (1,80, 80, 5) as detection
The text recognition model returns a 2-D
float32 Tensor of shape (1, 48) as
the mapping indices to the alphabet list '0123456789abcdefghijklmnopqrstuvwxyz'
The current text recognition model is trained using synthetic data with English letters and numbers, so only English is supported.
The models are not general enough for OCR in the wild (say, random images taken by a smartphone camera in a low lighting condition).
So we have chosen 3 Google product logos only to demonstrate how to do OCR with TensorFlow Lite. If you are looking for a ready-to-use production-grade OCR product, you should consider Google ML Kit. ML Kit, which uses TFLite underneath, should be sufficient for most OCR use cases, but there are some cases where you may want to build your own OCR solution with TFLite. Some examples are:
- You have your own text detection/recognition TFLite models that you would like to use
- You have special business requirements (i.e., recognizing texts that are upside down) and need to customize the OCR pipeline
- You want to support languages not covered by ML Kit
- Your target user devices don’t necessarily have Google Play services installed
- OpenCV text detection/recognition example: https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp
- OCR TFLite community project by community contributors: https://github.com/tulasiram58827/ocr_tflite
- OpenCV text detection: https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/
- Deep Learning based Text Detection Using OpenCV: https://learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/