TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

librispeech

Description:

LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

It's recommended to use lazy audio decoding for faster reading and smaller dataset size: - install tensorflow_io library: pip install tensorflow-io - enable lazy decoding: tfds.load('librispeech', builder_kwargs={'config': 'lazy_decode'})

Additional Documentation: Explore on Papers With Code
Homepage: http://www.openslr.org/12
Source code: tfds.datasets.librispeech.Builder
Download size: 57.14 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'dev_clean'`	2,703
`'dev_other'`	2,864
`'test_clean'`	2,620
`'test_other'`	2,939
`'train_clean100'`	28,539
`'train_clean360'`	104,014
`'train_other500'`	148,688

Feature structure:

FeaturesDict({
    'chapter_id': int64,
    'id': string,
    'speaker_id': int64,
    'speech': Audio(shape=(None,), dtype=int16),
    'text': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
chapter_id	Tensor		int64
id	Tensor		string
speaker_id	Tensor		int64
speech	Audio	(None,)	int16
text	Text		string

Supervised keys (See as_supervised doc): ('speech', 'text')
Figure (tfds.show_examples): Not supported.
Citation:

@inproceedings{panayotov2015librispeech,
  title={Librispeech: an ASR corpus based on public domain audio books},
  author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
  pages={5206--5210},
  year={2015},
  organization={IEEE}
}

librispeech/default (default config)

Config description: Default dataset.
Versions:
- 2.1.1 (default): Fix speech data type with dtype=tf.int16.
- 2.1.2: Add 'lazy_decode' config.
Dataset size: 304.47 GiB
Examples (tfds.as_dataframe):

librispeech/lazy_decode

Config description: Raw audio dataset.
Versions:
- 2.1.1: Fix speech data type with dtype=tf.int16.
- 2.1.2 (default): Add 'lazy_decode' config.
Dataset size: 59.37 GiB
Examples (tfds.as_dataframe): Missing.