TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

speech_commands

Description:

An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

Additional Documentation: Explore on Papers With Code
Homepage: https://arxiv.org/abs/1804.03209
Source code: tfds.datasets.speech_commands.Builder
Versions:
- 0.0.3 (default): Fix audio data type with dtype=tf.int16.
Download size: 2.37 GiB
Dataset size: 8.17 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	4,890
`'train'`	85,511
`'validation'`	10,102

Feature structure:

FeaturesDict({
    'audio': Audio(shape=(None,), dtype=int16),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=12),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
audio	Audio	(None,)	int16
label	ClassLabel		int64

Supervised keys (See as_supervised doc): ('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{speechcommandsv2,
   author = { {Warden}, P.},
    title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
  journal = {ArXiv e-prints},
  archivePrefix = "arXiv",
  eprint = {1804.03209},
  primaryClass = "cs.CL",
  keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
    year = 2018,
    month = apr,
    url = {https://arxiv.org/abs/1804.03209},
}