- Description:
FLEURS is the speech version of the FLORES machine translation benchmark, covering 2000 n-way parallel sentences in n=102 languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in “universal” speech representation learning.
In this version, only the FLEURS dataset is provided, which covers speech recognition and speech-to-text translation.
Config description: FLEURS is the speech version of the FLORES machine translation benchmark, covering 2000 n-way parallel sentences in n=102 languages.
Homepage: https://arxiv.org/abs/2205.12446
Source code:
tfds.audio.xtreme_s.XtremeS
Versions:
2.0.0
(default): Initial release on TFDS, FLEURS-only. Named to match version 2.0.0 on huggingface which has the same FLEURS data ( https://huggingface.co/datasets/google/xtreme_s).
Download size:
Unknown size
Dataset size:
Unknown size
Auto-cached (documentation): Unknown
Splits:
Split | Examples |
---|
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=tf.int64),
'gender': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
'id': Scalar(shape=(), dtype=tf.int32),
'lang_group_id': ClassLabel(shape=(), dtype=tf.int64, num_classes=7),
'lang_id': ClassLabel(shape=(), dtype=tf.int64, num_classes=102),
'language': Text(shape=(), dtype=tf.string),
'num_samples': Scalar(shape=(), dtype=tf.int32),
'path': tf.string,
'raw_transcription': Text(shape=(), dtype=tf.string),
'transcription': Text(shape=(), dtype=tf.string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | tf.int64 | |
gender | ClassLabel | tf.int64 | ||
id | Scalar | tf.int32 | Source text identifier, consistent across all languages to keep n-way parallelism of translations. Since each transcription may be spoken by multiple speakers, within each language multiple examples will also share the same id. | |
lang_group_id | ClassLabel | tf.int64 | ||
lang_id | ClassLabel | tf.int64 | ||
language | Text | tf.string | Language encoded as lowercase, underscore-separatedversion of a BCP-47 tag. | |
num_samples | Scalar | tf.int32 | Total number of frames in the audio | |
path | Tensor | tf.string | ||
raw_transcription | Text | tf.string | Raw Transcription from FLoRes. | |
transcription | Text | tf.string | Normalized transcription. |
Supervised keys (See
as_supervised
doc):('audio', 'transcription')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@article{fleurs2022arxiv,
title = {FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech},
author = {Conneau, Alexis and Ma, Min and Khanuja, Simran and Zhang, Yu and Axelrod, Vera and Dalmia, Siddharth and Riesa, Jason and Rivera, Clara and Bapna, Ankur},
journal={arXiv preprint arXiv:2205.12446},
url = {https://arxiv.org/abs/2205.12446},
year = {2022},
}
@article{conneau2022xtreme,
title={XTREME-S: Evaluating Cross-lingual Speech Representations},
author={Conneau, Alexis and Bapna, Ankur and Zhang, Yu and Ma, Min and von Platen, Patrick and Lozhkov, Anton and Cherry, Colin and Jia, Ye and Rivera, Clara and Kale, Mihir and others},
journal={arXiv preprint arXiv:2203.10752},
year={2022}
}