Integrate audio classifiers

Audio classification is a common use case of Machine Learning to classify the sound types. For example, it can identify the bird species by their songs.

The Task Library AudioClassifier API can be used to deploy your custom audio classifiers or pretrained ones into your mobile app.

Key features of the AudioClassifier API

  • Input audio processing, e.g. converting PCM 16 bit encoding to PCM Float encoding and the manipulation of the audio ring buffer.

  • Label map locale.

  • Supporting Multi-head classification model.

  • Supporting both single-label and multi-label classification.

  • Score threshold to filter results.

  • Top-k classification results.

  • Label allowlist and denylist.

Supported audio classifier models

The following models are guaranteed to be compatible with the AudioClassifier API.

Run inference in Java

See the Audio Classification reference app for an example using AudioClassifier in an Android app.

Step 1: Import Gradle dependency and other settings

Copy the .tflite model file to the assets directory of the Android module where the model will be run. Specify that the file should not be compressed, and add the TensorFlow Lite library to the module’s build.gradle file:

android {
    // Other settings

    // Specify that the tflite file should not be compressed when building the APK package.
    aaptOptions {
        noCompress "tflite"

dependencies {
    // Other dependencies

    // Import the Audio Task Library dependency (NNAPI is included)
    implementation 'org.tensorflow:tensorflow-lite-task-audio:0.4.0'
    // Import the GPU delegate plugin Library for GPU inference
    implementation 'org.tensorflow:tensorflow-lite-gpu-delegate-plugin:0.4.0'

Step 2: Using the model

// Initialization
AudioClassifierOptions options =
AudioClassifier classifier =
    AudioClassifier.createFromFileAndOptions(context, modelFile, options);

// Start recording
AudioRecord record = classifier.createAudioRecord();

// Load latest audio samples
TensorAudio audioTensor = classifier.createInputTensorAudio();

// Run inference
List<Classifications> results = audioClassifier.classify(audioTensor);

See the source code and javadoc for more options to configure AudioClassifier.

Run inference in Python

Step 1: Install the pip package

pip install tflite-support
  • Linux: Run sudo apt-get update && apt-get install libportaudio2
  • Mac and Windows: PortAudio is installed automatically when installing the tflite-support pip package.

Step 2: Using the model

# Imports
from tflite_support.task import vision
from tflite_support.task import core
from tflite_support.task import processor

# Initialization
base_options = core.BaseOptions(file_name=model_path)
classification_options = processor.ClassificationOptions(max_results=2)
options = audio.AudioClassifierOptions(base_options=base_options, classification_options=classification_options)
classifier = audio.AudioClassifier.create_from_options(options)

# Alternatively, you can create an audio classifier in the following manner:
# classifier = audio.AudioClassifier.create_from_file(model_path)

# Run inference
audio_file = audio.TensorAudio.create_from_wav_file(audio_path, classifier.required_input_buffer_size)
audio_result = classifier.classify(audio_file)

See the source code for more options to configure AudioClassifier.

Run inference in C++

// Initialization
AudioClassifierOptions options;
std::unique_ptr<AudioClassifier> audio_classifier = AudioClassifier::CreateFromOptions(options).value();

// Create input audio buffer from data.
int input_buffer_size = audio_classifier->GetRequiredInputBufferSize();
const std::unique_ptr<AudioBuffer> audio_buffer =
    AudioBuffer::Create(audio_data.get(), input_buffer_size, kAudioFormat).value();

// Run inference
const ClassificationResult result = audio_classifier->Classify(*audio_buffer).value();

See the source code for more options to configure AudioClassifier.

Model compatibility requirements

The AudioClassifier API expects a TFLite model with mandatory TFLite Model Metadata. See examples of creating metadata for audio classifiers using the TensorFlow Lite Metadata Writer API.

The compatible audio classifier models should meet the following requirements:

  • Input audio tensor (kTfLiteFloat32)

    • audio clip of size [batch x samples].
    • batch inference is not supported (batch is required to be 1).
    • for multi-channel models, the channels need to be interleaved.
  • Output score tensor (kTfLiteFloat32)

    • [1 x N] array with N represents the class number.
    • optional (but recommended) label map(s) as AssociatedFile-s with type TENSOR_AXIS_LABELS, containing one label per line. The first such AssociatedFile (if any) is used to fill the label field (named as class_name in C++) of the results. The display_name field is filled from the AssociatedFile (if any) whose locale matches the display_names_locale field of the AudioClassifierOptions used at creation time ("en" by default, i.e. English). If none of these are available, only the index field of the results will be filled.