This tutorial shows you how to use TensorFlow Lite with pre-built machine learning models to recognize sounds and spoken words in an Android app. Audio classification models like the ones shown in this tutorial can be used to detect activity, identify actions, or recognize voice commands.
This tutorial shows you how to download the example code, load the project into Android Studio, and explains key parts of the code example so you can start adding this functionality to your own app. The example app code uses the TensorFlow Task Library for Audio, which handles most of the audio data recording and preprocessing. For more information on how audio is pre-processed for use with machine learning models, see Audio Data Preparation and Augmentation.
Audio classification with machine learning
The machine learning model in this tutorial recognizes sounds or words from audio samples recorded with a microphone on an Android device. The example app in this tutorial allows you to switch between the YAMNet/classifier, a model that recognizes sounds, and a model that recognizes specific spoken words, that was trained using the TensorFlow Lite Model Maker tool. The models run predictions on audio clips that contain 15600 individual samples per clip and are about 1 second in length.
Setup and run example
For the first part of this tutorial, you download the sample from GitHub and run it using Android Studio. The following sections of this tutorial explore the relevant sections of the example, so you can apply them to your own Android apps.
System requirements
- Android Studio version 2021.1.1 (Bumblebee) or higher.
- Android SDK version 31 or higher
- Android device with a minimum OS version of SDK 24 (Android 7.0 - Nougat) with developer mode enabled.
Get the example code
Create a local copy of the example code. You will use this code to create a project in Android Studio and run the sample application.
To clone and setup the example code:
- Clone the git repository
git clone https://github.com/tensorflow/examples.git
Optionally, configure your git instance to use sparse checkout, so you have only the files for the example app:
cd examples git sparse-checkout init --cone git sparse-checkout set lite/examples/audio_classification/android
Import and run the project
Create a project from the downloaded example code, build the project, and then run it.
To import and build the example code project:
- Start Android Studio.
- In Android Studio, choose File > New > Import Project.
- Navigate to the example code directory containing the
build.gradle
file (.../examples/lite/examples/audio_classification/android/build.gradle
) and select that directory.
If you select the correct directory, Android Studio creates a new project and
builds it. This process can take a few minutes, depending on the speed of your
computer and if you have used Android Studio for other projects. When the build
completes, the Android Studio displays a BUILD SUCCESSFUL
message in the
Build Output status panel.
To run the project:
- From Android Studio, run the project by selecting Run > Run 'app'.
- Select an attached Android device with a microphone to test the app.
The next sections show you the modifications you need to make to your existing project to add this functionality to your own app, using this example app as a reference point.
Add project dependencies
In your own application, you must add specific project dependencies to run TensorFlow Lite machine learning models, and access utility functions that convert standard data formats, such as audio, into a tensor data format that can be processed by the model you are using.
The example app uses the following TensorFlow Lite libraries:
- TensorFlow Lite Task library Audio API - Provides the required audio data input classes, execution of the machine learning model, and output results from the model processing.
The following instructions show how to add the required project dependencies to your own Android app project.
To add module dependencies:
In the module that uses TensorFlow Lite, update the module's
build.gradle
file to include the following dependencies. In the example code, this file is located here:.../examples/lite/examples/audio_classification/android/build.gradle
dependencies { ... implementation 'org.tensorflow:tensorflow-lite-task-audio' }
In Android Studio, sync the project dependencies by selecting: File > Sync Project with Gradle Files.
Initialize the ML model
In your Android app, you must initialize the TensorFlow Lite machine learning model with parameters before running predictions with the model. These initialization parameters are dependent on the model and can include settings such as default minimum accuracy thresholds for predictions and labels for words or sounds that the model can recognize.
A TensorFlow Lite model includes a *.tflite
file containing the model. The
model file contains the prediction logic and typically includes metadata
about how to interpret prediction results, such as prediction class names. Model
files should be stored in the src/main/assets
directory of your development
project, as in the code example:
<project>/src/main/assets/yamnet.tflite
For convenience and code readability, the example declares a companion object that defines the settings for the model.
To initialize the model in your app:
Create a companion object to define the settings for the model:
companion object { const val DISPLAY_THRESHOLD = 0.3f const val DEFAULT_NUM_OF_RESULTS = 2 const val DEFAULT_OVERLAP_VALUE = 0.5f const val YAMNET_MODEL = "yamnet.tflite" const val SPEECH_COMMAND_MODEL = "speech.tflite" }
Create the settings for the model by building an
AudioClassifier.AudioClassifierOptions
object:val options = AudioClassifier.AudioClassifierOptions.builder() .setScoreThreshold(classificationThreshold) .setMaxResults(numOfResults) .setBaseOptions(baseOptionsBuilder.build()) .build()
Use this settings object to construct a TensorFlow Lite
AudioClassifier
object that contains the model:classifier = AudioClassifier.createFromFileAndOptions(context, "yamnet.tflite", options)
Enable hardware acceleration
When initializing a TensorFlow Lite model in your app, you should consider using hardware acceleration features to speed up the prediction calculations of the model. TensorFlow Lite delegates are software modules that accelerate execution of machine learning models using specialized processing hardware on a mobile device, such as graphics processing unit (GPUs) or tensor processing units (TPUs). The code example uses the NNAPI Delegate to handle hardware acceleration of the model execution:
val baseOptionsBuilder = BaseOptions.builder()
.setNumThreads(numThreads)
...
when (currentDelegate) {
DELEGATE_CPU -> {
// Default
}
DELEGATE_NNAPI -> {
baseOptionsBuilder.useNnapi()
}
}
Using delegates for running TensorFlow Lite models is recommended, but not required. For more information about using delegates with TensorFlow Lite, see TensorFlow Lite Delegates.
Prepare data for the model
In your Android app, your code provides data to the model for interpretation by transforming existing data such as audio clips into a Tensor data format that can be processed by your model. The data in a Tensor you pass to a model must have specific dimensions, or shape, that matches the format of data used to train the model.
The YAMNet/classifier model and the customized speech commands models used in this code example accepts Tensor data objects that represent single-channel, or mono, audio clips recorded at 16kHz in 0.975 second clips (15600 samples). Running predictions on new audio data, your app must transform that audio data into Tensor data objects of that size and shape. The TensorFlow Lite Task Library Audio API handles the data transformation for you.
In the example code AudioClassificationHelper
class, the app records live
audio from the device microphones using an Android
AudioRecord
object. The code uses
AudioClassifier
to build and configure that object to record audio at a sampling rate
appropriate for the model. The code also uses AudioClassifier to build a
TensorAudio
object to store the transformed audio data. Then the TensorAudio object is
passed to the model for analysis.
To provide audio data to the ML model:
Use the
AudioClassifier
object to create aTensorAudio
object and aAudioRecord
object:fun initClassifier() { ... try { classifier = AudioClassifier.createFromFileAndOptions(context, currentModel, options) // create audio input objects tensorAudio = classifier.createInputTensorAudio() recorder = classifier.createAudioRecord() }
Run predictions
In your Android app, once you have connected an AudioRecord object and a TensorAudio object to an AudioClassifier object, you can run the model against that data to produce a prediction, or inference. The example code for this tutorial runs predictions on clips from a live-recorded audio input stream at a specific rate.
Model execution consumes significant resources, so it's important to run ML
model predictions on a separate, background thread. The example app uses a
[ScheduledThreadPoolExecutor](https://developer.android.com/reference/java/util/concurrent/ScheduledThreadPoolExecutor)
object to isolate the model processing from other functions of the app.
Audio classification models that recognize sounds with a clear beginning and
end, such as words, can produce more accurate predictions on an incoming audio
stream by analyzing overlapping audio clips. This approach
helps the model avoid missing predictions for words that are cut off at the end
of a clip. In the example app, each time you run a prediction the code grabs the
latest 0.975 second clip from the audio recording buffer and analyzes it. You
can make the model analyze overlapping audio clips by setting the model analysis
thread execution pool interval
value to a length that's shorter than the
length of the clips being analyzed. For example, if your model analyzes 1 second
clips and you set the interval to 500 milliseconds, the model will analyze the
last half of the previous clip and 500 milliseconds of new audio data each time,
creating a clip analysis overlap of 50%.
To start running predictions on the audio data:
Use the
AudioClassificationHelper.startAudioClassification()
method to start the audio recording for the model:fun startAudioClassification() { if (recorder.recordingState == AudioRecord.RECORDSTATE_RECORDING) { return } recorder.startRecording() }
Set how frequently the model generates an inference from the audio clips by setting a fixed rate
interval
in theScheduledThreadPoolExecutor
object:executor = ScheduledThreadPoolExecutor(1) executor.scheduleAtFixedRate( classifyRunnable, 0, interval, TimeUnit.MILLISECONDS)
The
classifyRunnable
object in the code above executes theAudioClassificationHelper.classifyAudio()
method, which loads the latest available audio data from the recorder and performs a prediction:private fun classifyAudio() { tensorAudio.load(recorder) val output = classifier.classify(tensorAudio) ... }
Stop prediction processing
Make sure your app code stops doing audio classification when your app's audio
processing Fragment or Activity loses focus. Running a machine learning model
continuously has a significant impact on the battery life of an Android device.
Use the onPause()
method of the Android activity or fragment associated with
the audio classification to stop audio recording and prediction processing.
To stop audio recording and classification:
Use the
AudioClassificationHelper.stopAudioClassification()
method to stop recording and the model execution, as shown below in theAudioFragment
class:override fun onPause() { super.onPause() if (::audioHelper.isInitialized ) { audioHelper.stopAudioClassification() } }
Handle model output
In your Android app, after you process an audio clip, the model produces a list
of predictions which your app code must handle by executing additional business
logic, displaying results to the user, or taking other actions. The output of
any given TensorFlow Lite model varies in terms of the number of predictions it
produces (one or many), and the descriptive information for each prediction. In
the case of the models in the example app, the predictions are either a list of
recognized sounds or words. The AudioClassifier options object used in the code
example lets you set the maximum number of predictions with the
setMaxResults()
method, as shown in Initialize the ML
model section.
To get the prediction results from the model:
Get the results of the AudioClassifier object's
classify()
method and pass them to the listener object (code reference):private fun classifyAudio() { ... val output = classifier.classify(tensorAudio) listener.onResult(output[0].categories, inferenceTime) }
Use the listener's onResult() function handle the output by executing business logic or displaying results to the user:
private val audioClassificationListener = object : AudioClassificationListener { override fun onResult(results: List<Category>, inferenceTime: Long) { requireActivity().runOnUiThread { adapter.categoryList = results adapter.notifyDataSetChanged() fragmentAudioBinding.bottomSheetLayout.inferenceTimeVal.text = String.format("%d ms", inferenceTime) } }
The model used in this example generates a list of predictions with a label for the classified sound or word, and a prediction score between 0 and 1 as a Float representing the confidence of the prediction, with 1 being the highest confidence rating. In general, predictions with a score below 50% (0.5) are considered inconclusive. However, how you handle low-value prediction results is up to you and the needs of your application.
Once the model has returned a set of prediction results, your application can act on those predictions by presenting the result to your user or executing additional logic. In the case of the example code, the application lists the identified sounds or words in the app user interface.
Next steps
You can find additional TensorFlow Lite models for audio processing on TensorFlow Hub and through the Pre-trained models guide page. For more information about implementing machine learning in your mobile application with TensorFlow Lite, see the TensorFlow Lite Developer Guide.