- Description:
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded in 2016-17 by the LibriVox project and is also in the public domain.
Additional Documentation: Explore on Papers With Code
Homepage: https://keithito.com/LJ-Speech-Dataset/
Source code:
tfds.datasets.ljspeech.Builder
Versions:
1.1.1
(default): Fix speech data type with dtype=tf.int16.
Download size:
2.56 GiB
Dataset size:
10.73 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
13,100 |
- Feature structure:
FeaturesDict({
'id': string,
'speech': Audio(shape=(None,), dtype=int16),
'text': Text(shape=(), dtype=string),
'text_normalized': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
id | Tensor | string | ||
speech | Audio | (None,) | int16 | |
text | Text | string | ||
text_normalized | Text | string |
Supervised keys (See
as_supervised
doc):('text_normalized', 'speech')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@misc{ljspeech17,
author = {Keith Ito},
title = {The LJ Speech Dataset},
howpublished = {\url{https://keithito.com/LJ-Speech-Dataset/} },
year = 2017
}