- Description:
A free audio dataset of spoken digits. Think MNIST for audio.
A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.
5 speakers
2,500 recordings (50 of each digit per speaker)
English pronunciations
Files are named in the following format: {digitLabel}{speakerName}{index}.wav
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/Jakobovski/free-spoken-digit-dataset
Source code:
tfds.datasets.spoken_digit.Builder
Versions:
1.0.9
(default): No release notes.
Download size:
11.42 MiB
Dataset size:
45.68 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'train' |
2,500 |
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int64),
'audio/filename': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | int64 | |
audio/filename | Text | string | ||
label | ClassLabel | int64 |
Supervised keys (See
as_supervised
doc):('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@ONLINE {Free Spoken Digit Dataset,
author = "Zohar Jackson",
title = "Spoken_Digit",
year = "2016",
url = "https://github.com/Jakobovski/free-spoken-digit-dataset"
}