TensorFlow 2.0 RC is available Learn more

Image classification

Use a pre-trained and optimized model to identify hundreds of classes of objects, including people, activities, animals, plants, and places.

Get started

If you are unfamiliar with the concept of image classification, you should start by reading What is image classification?

To learn how to use image classification in a mobile app, we recommend exploring our Example applications and guides.

If you are using a platform other than Android or iOS, or you are already familiar with the TensorFlow Lite APIs, you can download our starter image classification model and the accompanying labels.

Download starter model and labels

Once you have the starter model running on your target device, you can experiment with different models to find the optimal balance between performance, accuracy, and model size. For guidance, see Choose a different model.

Example applications and guides

We have example applications for image classification for both Android and iOS. For each example, we provide a guide that explains how it works.

Android

View Android example

Read the Android example guide to learn how the app works.

iOS

View iOS example

Read the iOS example guide to learn how the app works.

Screenshot

The following screenshot shows the Android image classification example.

Screenshot of Android example

What is image classification?

A common use of machine learning is to identify what an image represents. For example, we might want to know what type of animal appears in the following photograph.

dog

The task of predicting what an image represents is called image classification. An image classification model is trained to recognize various classes of images. For example, a model might be trained to recognize photos representing three different types of animals: rabbits, hamsters, and dogs.

When we subsequently provide a new image as input to the model, it will output the probabilities of the image representing each of the types of animal it was trained on. An example output might be as follows:

Animal type Probability
Rabbit 0.07
Hamster 0.02
Dog 0.91

Based on the output, we can see that the classification model has predicted that the image has a high probability of representing a dog.

Training, labels, and inference

During training, an image classification model is fed images and their associated labels. Each label is the name of a distinct concept, or class, that the model will learn to recognize.

Given sufficient training data (often hundreds or thousands of images per label), an image classification model can learn to predict whether new images belong to any of the classes it has been trained on. This process of prediction is called inference.

To perform inference, an image is passed as input to a model. The model will then output an array of probabilities between 0 and 1. With our example model, this process might look like the following:

dog [0.07, 0.02, 0.91]

Each number in the output corresponds to a label in our training data. Associating our output with the three labels the model was trained on, we can see the model has predicted a high probability that the image represents a dog.

Label Probability
rabbit 0.07
hamster 0.02
dog 0.91

You might notice that the sum of all the probabilities (for rabbit, hamster, and dog) is equal to 1. This is a common type of output for models with multiple classes (see Softmax for more information).

Ambiguous results

Since the probabilities will always sum to 1, if the image is not confidently recognized as belonging to any of the classes the model was trained on you may see the probability distributed throughout the labels without any one value being significantly larger.

For example, the following might indicate an ambiguous result:

Label Probability
rabbit 0.31
hamster 0.35
dog 0.34

Uses and limitations

The image classification models that we provide are useful for single-label classification, which means predicting which single label the image is most likely to represent. They are trained to recognize 1000 classes of image. For a full list of classes, see the labels file in the model zip.

If you want to train a model to recognize new classes, see Customize model.

For the following use cases, you should use a different type of model:

  • Predicting the type and position of one or more objects within an image (see Object detection)
  • Predicting the composition of an image, for example subject versus background (see Segmentation)

Once you have the starter model running on your target device, you can experiment with different models to find the optimal balance between performance, accuracy, and model size. For guidance, see Choose a different model.

Choose a different model

There are a large number of image classification models available on our List of hosted models. You should aim to choose the optimal model for your application based on performance, accuracy and model size. There are trade-offs between each of them.

Performance

We measure performance in terms of the amount of time it takes for a model to run inference on a given piece of hardware. The less time, the faster the model.

The performance you require depends on your application. Performance can be important for applications like real-time video, where it may be important to analyze each frame in the time before the next frame is drawn (e.g. inference must be faster than 33ms to perform real-time inference on a 30fps video stream).

Our quantized MobileNet models’ performance ranges from 3.7ms to 80.3 ms.

Accuracy

We measure accuracy in terms of how often the model correctly classifies an image. For example, a model with a stated accuracy of 60% can be expected to classify an image correctly an average of 60% of the time.

Our list of hosted models provides Top-1 and Top-5 accuracy statistics. Top-1 refers to how often the correct label appears as the label with the highest probability in the model’s output. Top-5 refers to how often the correct label appears in the top 5 highest probabilities in the model’s output.

Our quantized MobileNet models’ Top-5 accuracy ranges from 64.4 to 89.9%.

Size

The size of a model on-disk varies with its performance and accuracy. Size may be important for mobile development (where it might impact app download sizes) or when working with hardware (where available storage might be limited).

Our quantized MobileNet models’ size ranges from 0.5 to 3.4 Mb.

Architecture

There are several different architectures of models available on List of hosted models, indicated by the model’s name. For example, you can choose between MobileNet, Inception, and others.

The architecture of a model impacts its performance, accuracy, and size. All of our hosted models are trained on the same data, meaning you can use the provided statistics to compare them and choose which is optimal for your application.

Customize model

The pre-trained models we provide are trained to recognize 1000 classes of image. For a full list of classes, see the labels file in the model zip.

You can use a technique known as transfer learning to re-train a model to recognize classes not in the original set. For example, you could re-train the model to distinguish between different species of tree, despite there being no trees in the original training data. To do this, you will need a set of training images for each of the new labels you wish to train.

Learn how to perform transfer learning in the Recognize flowers with TensorFlow codelab.