voxceleb

  • Description:

An large scale dataset for speaker identification. This data is collected from over 1,251 speakers, with over 150k samples in total. This release contains the audio part of the voxceleb1.1 dataset.

Split Examples
'test' 7,972
'train' 134,000
'validation' 6,670
  • Feature structure:
FeaturesDict({
    'audio': Audio(shape=(None,), dtype=tf.int64),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1252),
    'youtube_id': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
audio Audio (None,) tf.int64
label ClassLabel tf.int64
youtube_id Text tf.string
  • Citation:
@InProceedings{Nagrani17,
    author       = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
    title        = "VoxCeleb: a large-scale speaker identification dataset",
    booktitle    = "INTERSPEECH",
    year         = "2017",
}