TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

vctk

Description:

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

Note that the 'p315' text was lost due to a hard disk error.

Additional Documentation: Explore on Papers With Code
Homepage: https://doi.org/10.7488/ds/2645
Source code: tfds.audio.Vctk
Versions:
- 1.0.0: VCTK release 0.92.0.
- 1.0.1 (default): Fix speech data type with dtype=tf.int16.
Download size: 10.94 GiB
Auto-cached (documentation): No
Feature structure:

FeaturesDict({
    'accent': ClassLabel(shape=(), dtype=int64, num_classes=13),
    'gender': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'id': string,
    'speaker': ClassLabel(shape=(), dtype=int64, num_classes=110),
    'speech': Audio(shape=(None,), dtype=int16),
    'text': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
accent	ClassLabel		int64
gender	ClassLabel		int64
id	Tensor		string
speaker	ClassLabel		int64
speech	Audio	(None,)	int16
text	Text		string

Supervised keys (See as_supervised doc): ('text', 'speech')
Figure (tfds.show_examples): Not supported.
Citation:

@misc{yamagishi2019vctk,
  author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},
  title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},
  publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},
  year=2019,
  doi={10.7488/ds/2645},
}

vctk/mic1 (default config)

Config description: Audio recorded using an omni-directional microphone (DPA 4035). Contains very low frequency noises.
```
      This is the same audio released in previous versions of VCTK:
      https://doi.org/10.7488/ds/1994
```
Dataset size: 39.87 GiB
Splits:

Split	Examples
`'train'`	44,455

Examples (tfds.as_dataframe):

vctk/mic2

Config description: Audio recorded using a small diaphragm condenser microphone with very wide bandwidth (Sennheiser MKH 800).
```
      Two speakers, p280 and p315 had technical issues of the audio
      recordings using MKH 800.
```
Dataset size: 38.86 GiB
Splits:

Split	Examples
`'train'`	43,873

Examples (tfds.as_dataframe):