TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

ljspeech

Description:

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded in 2016-17 by the LibriVox project and is also in the public domain.

Additional Documentation: Explore on Papers With Code
Homepage: https://keithito.com/LJ-Speech-Dataset/
Source code: tfds.datasets.ljspeech.Builder
Versions:
- 1.1.1 (default): Fix speech data type with dtype=tf.int16.
Download size: 2.56 GiB
Dataset size: 10.73 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'train'`	13,100

Feature structure:

FeaturesDict({
    'id': string,
    'speech': Audio(shape=(None,), dtype=int16),
    'text': Text(shape=(), dtype=string),
    'text_normalized': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
id	Tensor		string
speech	Audio	(None,)	int16
text	Text		string
text_normalized	Text		string

Supervised keys (See as_supervised doc): ('text_normalized', 'speech')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@misc{ljspeech17,
  author       = {Keith Ito},
  title        = {The LJ Speech Dataset},
  howpublished = {\url{https://keithito.com/LJ-Speech-Dataset/} },
  year         = 2017
}