- Description:
A free audio dataset of spoken digits. Think MNIST for audio.
A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.
5 speakers 2,500 recordings (50 of each digit per speaker) English pronunciations
Files are named in the following format: {digitLabel}{speakerName}{index}.wav
Homepage: https://github.com/Jakobovski/free-spoken-digit-dataset
Source code:
tfds.audio.spoken_digit.SpokenDigit
Versions:
1.0.9
(default): No release notes.
Download size:
11.42 MiB
Dataset size:
45.68 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'train' |
2,500 |
- Features:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=tf.int64),
'audio/filename': Text(shape=(), dtype=tf.string),
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})
Supervised keys (See
as_supervised
doc):('audio', 'label')
Citation:
@ONLINE {Free Spoken Digit Dataset,
author = "Zohar Jackson",
title = "Spoken_Digit",
year = "2016",
url = "https://github.com/Jakobovski/free-spoken-digit-dataset"
}
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):