Optical character recognition (OCR)

Optical character recognition (OCR) is the process of recognizing characters from images using computer vision and machine learning techniques. This reference app demos how to use TensorFlow Lite to do OCR. It uses a combination of text detection model and a text recognition model as an OCR pipeline to recognize text characters.

Get started

If you are new to TensorFlow Lite and are working with Android, we recommend exploring the following example application that can help you get started.

Android example

If you are using a platform other than Android, or you are already familiar with the TensorFlow Lite APIs, you can download the models from TF Hub.

How it works

OCR tasks are often broken down into 2 stages. First, we use a text detection model to detect the bounding boxes around possible texts. Second, we feed processed bounding boxes into a text recognition model to determine specific characters inside the bounding boxes (we also need to do Non-Maximal Supression, perspective transformation and etc. beforing text recoginition). In our case, both models are from TensorFlow Hub and they are FP16 quantized models.

Performance benchmarks

Performance benchmark numbers are generated with the tool described here.

Model Name Model size Device CPU GPU
Text Detection 45.9 Mb Pixel 4 (Android 10) 181.93ms* 89.77ms*
Text Recognition 16.8 Mb Pixel 4 (Android 10) 338.33ms* N/A**

* 4 threads used.

** this model could not use GPU delegate since we need TensorFlow ops to run it

Inputs

The text detection model accepts a 4-D float32 Tensor of (1, 320, 320, 3) as input.

The text recognition model accepts a 4-D float32 Tensor of (1, 31, 200, 1) as input.

Outputs

The text detection model returns a 4-D float32 Tensor of shape (1, 80, 80, 5) as bounding box and a 4-D float32 Tensor of shape (1,80, 80, 5) as detection score.

The text recognition model returns a 2-D float32 Tensor of shape (1, 48) as the mapping indices to the alphabet list '0123456789abcdefghijklmnopqrstuvwxyz'

Limitations

  • The current text recognition model is trained using synthetic data with English letters and numbers, so only English is supported.

  • The models are not general enough for OCR in the wild (say, random images taken by a smartphone camera in a low lighting condition).

So we have chosen 3 Google product logos only to demonstrate how to do OCR with TensorFlow Lite. If you are looking for a ready-to-use production-grade OCR product, you should consider Google ML Kit. ML Kit, which uses TFLite underneath, should be sufficient for most OCR use cases, but there are some cases where you may want to build your own OCR solution with TFLite. Some examples are:

  • You have your own text detection/recognition TFLite models that you would like to use
  • You have special business requirements (i.e., recognizing texts that are upside down) and need to customize the OCR pipeline
  • You want to support languages not covered by ML Kit
  • Your target user devices don’t necessarily have Google Play services installed

References