Using graphics processing units (GPUs) to run your machine learning (ML) models can dramatically improve the performance and the user experience of your ML-enabled applications. On Android devices, you can enable GPU-accelerated execution of your models using a delegate and one of the following APIs:
This guide covers advanced uses of the GPU delegate for the C API, C++ API, and use of quantized models. For more information about using the GPU delegate for TensorFlow Lite, including best practices and advanced techniques, see the GPU delegates page.
Enable GPU acceleration
Use the TensorFlow Lite GPU delegate for Android in C or C++ by creating the
TfLiteGpuDelegateV2Create() and destroying it with
TfLiteGpuDelegateV2Delete(), as shown in the following example code:
// Set up interpreter. auto model = FlatBufferModel::BuildFromFile(model_path); if (!model) return false; ops::builtin::BuiltinOpResolver op_resolver; std::unique_ptr<Interpreter> interpreter; InterpreterBuilder(*model, op_resolver)(&interpreter); // NEW: Prepare GPU delegate. auto* delegate = TfLiteGpuDelegateV2Create(/*default options=*/nullptr); if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false; // Run inference. WriteToInputTensor(interpreter->typed_input_tensor<float>(0)); if (interpreter->Invoke() != kTfLiteOk) return false; ReadFromOutputTensor(interpreter->typed_output_tensor<float>(0)); // NEW: Clean up. TfLiteGpuDelegateV2Delete(delegate);
TfLiteGpuDelegateOptionsV2 object code to build a delegate instance
with custom options. You can initialize the default options with
TfLiteGpuDelegateOptionsV2Default() and then modify them as necessary.
The TensorFlow Lite GPU delegate for Android in C or C++ uses the Bazel build system. You can build the delegate using the following command:
bazel build -c opt --config android_arm64 tensorflow/lite/delegates/gpu:delegate # for static library bazel build -c opt --config android_arm64 tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so # for dynamic library
Interpreter::Invoke(), the caller must have an
EGLContext in the current
Interpreter::Invoke() must be called from the same
EGLContext does not exist, the delegate creates one internally, but then
you must ensure that
Interpreter::Invoke() is always called from the same
thread in which
Interpreter::ModifyGraphWithDelegate() was called.
Android GPU delegate libraries support quantized models by default. You do not have to make any code changes to use quantized models with the GPU delegate. The following section explains how to disable quantized support for testing or experimental purposes.
Disable quantized model support
The following code shows how to disable support for quantized models.
TfLiteGpuDelegateOptionsV2 options = TfLiteGpuDelegateOptionsV2Default(); options.experimental_flags = TFLITE_GPU_EXPERIMENTAL_FLAGS_NONE; auto* delegate = TfLiteGpuDelegateV2Create(options); if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;
For more information about running quantized models with GPU acceleration, see GPU delegate overview.