librispeech

  • Description:

LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.87

Split Examples
'dev_clean' 2,703
'dev_other' 2,864
'test_clean' 2,620
'test_other' 2,939
'train_clean100' 28,539
'train_clean360' 104,014
'train_other500' 148,688
@inproceedings{panayotov2015librispeech,
  title={Librispeech: an ASR corpus based on public domain audio books},
  author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
  pages={5206--5210},
  year={2015},
  organization={IEEE}
}

librispeech/plain_text (default config)

  • Config description: Transcriptions are in plain text.

  • Dataset size: 304.47 GiB

  • Features:

FeaturesDict({
    'chapter_id': tf.int64,
    'id': tf.string,
    'speaker_id': tf.int64,
    'speech': Audio(shape=(None,), dtype=tf.int64),
    'text': Text(shape=(), dtype=tf.string),
})

librispeech/subwords8k

  • Config description: Transcriptions use the SubwordTextEncoder

  • Dataset size: 304.44 GiB

  • Features:

FeaturesDict({
    'chapter_id': tf.int64,
    'id': tf.string,
    'speaker_id': tf.int64,
    'speech': Audio(shape=(None,), dtype=tf.int64),
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8215>),
})

librispeech/subwords32k

  • Config description: Transcriptions use the SubwordTextEncoder

  • Dataset size: 304.44 GiB

  • Features:

FeaturesDict({
    'chapter_id': tf.int64,
    'id': tf.string,
    'speaker_id': tf.int64,
    'speech': Audio(shape=(None,), dtype=tf.int64),
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32550>),
})