Performance benchmarks

This document lists TensorFlow Lite performance benchmarks when running well known models on some Android and iOS devices.

These performance benchmark numbers were generated with the Android TFLite benchmark binary and the iOS benchmark app.

Android performance benchmarks

For Android benchmarks, the CPU affinity is set to use big cores on the device to reduce variance (see details).

It assumes that models were download and unzipped to the /data/local/tmp/tflite_models directory. The benchmark binary is built using these instructions and assumed in the /data/local/tmp directory.

To run the benchmark:

adb shell /data/local/tmp/benchmark_model \
  --num_threads=4 \
  --graph=/data/local/tmp/tflite_models/${GRAPH} \
  --warmup_runs=1 \
  --num_runs=50

To run with nnapi delegate, please set --use_nnapi=true. To run with gpu delegate, please set --use_gpu=true.

The performance values below are measured on Android 10.

Model Name Device CPU, 4 threads GPU NNAPI
Mobilenet_1.0_224(float) Pixel 3 23.9 ms 6.45 ms 13.8 ms
Pixel 4 14.0 ms 9.0 ms 14.8 ms
Mobilenet_1.0_224 (quant) Pixel 3 13.4 ms --- 6.0 ms
Pixel 4 5.0 ms --- 3.2 ms
NASNet mobile Pixel 3 56 ms --- 102 ms
Pixel 4 34.5 ms --- 99.0 ms
SqueezeNet Pixel 3 35.8 ms 9.5 ms 18.5 ms
Pixel 4 23.9 ms 11.1 ms 19.0 ms
Inception_ResNet_V2 Pixel 3 422 ms 99.8 ms 201 ms
Pixel 4 272.6 ms 87.2 ms 171.1 ms
Inception_V4 Pixel 3 486 ms 93 ms 292 ms
Pixel 4 324.1 ms 97.6 ms 186.9 ms

iOS benchmarks

To run iOS benchmarks, the benchmark app was modified to include the appropriate model and benchmark_params.json was modified to set num_threads to 2. For GPU delegate, "use_gpu" : "1" and "gpu_wait_type" : "aggressive" options were also added to benchmark_params.json.

Model Name Device CPU, 2 threads GPU
Mobilenet_1.0_224(float) iPhone XS 14.8 ms 3.4 ms
Mobilenet_1.0_224 (quant) iPhone XS 11 ms ---
NASNet mobile iPhone XS 30.4 ms ---
SqueezeNet iPhone XS 21.1 ms 15.5 ms
Inception_ResNet_V2 iPhone XS 261.1 ms 45.7 ms
Inception_V4 iPhone XS 309 ms 54.4 ms