References:
ab
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ab')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
8 |
'other' |
752 |
'test' |
9 |
'train' |
22 |
'validated' |
31 |
'validation' |
0 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ar
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ar')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
6333 |
'other' |
18283 |
'test' |
7622 |
'train' |
14227 |
'validated' |
43291 |
'validation' |
7517 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
as
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/as')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
31 |
'other' |
0 |
'test' |
110 |
'train' |
270 |
'validated' |
504 |
'validation' |
124 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
br
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/br')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
623 |
'other' |
10912 |
'test' |
2087 |
'train' |
2780 |
'validated' |
8560 |
'validation' |
1997 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ca
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ca')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
18846 |
'other' |
64446 |
'test' |
15724 |
'train' |
285584 |
'validated' |
416701 |
'validation' |
15724 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cnh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/cnh')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
433 |
'other' |
2934 |
'test' |
752 |
'train' |
807 |
'validated' |
2432 |
'validation' |
756 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cs
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/cs')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
685 |
'other' |
7475 |
'test' |
4144 |
'train' |
5655 |
'validated' |
30431 |
'validation' |
4118 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/cv')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
1282 |
'other' |
6927 |
'test' |
788 |
'train' |
931 |
'validated' |
3496 |
'validation' |
818 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cy
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/cy')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
3648 |
'other' |
17919 |
'test' |
4820 |
'train' |
6839 |
'validated' |
72984 |
'validation' |
4776 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
de
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/de')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
32789 |
'other' |
10095 |
'test' |
15588 |
'train' |
246525 |
'validated' |
565186 |
'validation' |
15588 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
dv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/dv')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
840 |
'other' |
0 |
'test' |
2202 |
'train' |
2680 |
'validated' |
11866 |
'validation' |
2077 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
el
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/el')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
185 |
'other' |
5659 |
'test' |
1522 |
'train' |
2316 |
'validated' |
5996 |
'validation' |
1401 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/en')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
189562 |
'other' |
169895 |
'test' |
16164 |
'train' |
564337 |
'validated' |
1224864 |
'validation' |
16164 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
eo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/eo')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
4736 |
'other' |
2946 |
'test' |
8969 |
'train' |
19587 |
'validated' |
58094 |
'validation' |
8987 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/es')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
40640 |
'other' |
144791 |
'test' |
15089 |
'train' |
161813 |
'validated' |
236314 |
'validation' |
15089 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
et
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/et')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
3557 |
'other' |
569 |
'test' |
2509 |
'train' |
2966 |
'validated' |
10683 |
'validation' |
2507 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
eu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/eu')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
5387 |
'other' |
23570 |
'test' |
5172 |
'train' |
7505 |
'validated' |
63009 |
'validation' |
5172 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/fa')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
11698 |
'other' |
22510 |
'test' |
5213 |
'train' |
7593 |
'validated' |
251659 |
'validation' |
5213 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/fi')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
59 |
'other' |
149 |
'test' |
428 |
'train' |
460 |
'validated' |
1305 |
'validation' |
415 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/fr')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
40351 |
'other' |
3222 |
'test' |
15763 |
'train' |
298982 |
'validated' |
461004 |
'validation' |
15763 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fy-NL
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/fy-NL')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
1031 |
'other' |
21569 |
'test' |
3020 |
'train' |
3927 |
'validated' |
10495 |
'validation' |
2790 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ga-IE
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ga-IE')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
409 |
'other' |
2130 |
'test' |
506 |
'train' |
541 |
'validated' |
3352 |
'validation' |
497 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
hi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/hi')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
60 |
'other' |
139 |
'test' |
127 |
'train' |
157 |
'validated' |
419 |
'validation' |
135 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
hsb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/hsb')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
227 |
'other' |
62 |
'test' |
387 |
'train' |
808 |
'validated' |
1367 |
'validation' |
172 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
hu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/hu')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
169 |
'other' |
295 |
'test' |
1649 |
'train' |
3348 |
'validated' |
6457 |
'validation' |
1434 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ia
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ia')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
192 |
'other' |
1095 |
'test' |
899 |
'train' |
3477 |
'validated' |
5978 |
'validation' |
1601 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
id
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/id')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
470 |
'other' |
6782 |
'test' |
1844 |
'train' |
2130 |
'validated' |
8696 |
'validation' |
1835 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/it')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
12189 |
'other' |
14549 |
'test' |
12928 |
'train' |
58015 |
'validated' |
102579 |
'validation' |
12928 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ja')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
504 |
'other' |
885 |
'test' |
632 |
'train' |
722 |
'validated' |
3072 |
'validation' |
586 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ka
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ka')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
139 |
'other' |
44 |
'test' |
656 |
'train' |
1058 |
'validated' |
2275 |
'validation' |
527 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
kab
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/kab')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
18134 |
'other' |
88021 |
'test' |
14622 |
'train' |
120530 |
'validated' |
573718 |
'validation' |
14622 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ky
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ky')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
926 |
'other' |
7223 |
'test' |
1503 |
'train' |
1955 |
'validated' |
9236 |
'validation' |
1511 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lg
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/lg')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
290 |
'other' |
3110 |
'test' |
584 |
'train' |
1250 |
'validated' |
2220 |
'validation' |
384 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/lt')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
102 |
'other' |
1629 |
'test' |
466 |
'train' |
931 |
'validated' |
1644 |
'validation' |
244 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/lv')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
143 |
'other' |
1560 |
'test' |
1882 |
'train' |
2552 |
'validated' |
6444 |
'validation' |
2002 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
mn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/mn')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
667 |
'other' |
3272 |
'test' |
1862 |
'train' |
2183 |
'validated' |
7487 |
'validation' |
1837 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
mt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/mt')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
314 |
'other' |
5714 |
'test' |
1617 |
'train' |
2036 |
'validated' |
5747 |
'validation' |
1516 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/nl')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
3308 |
'other' |
27 |
'test' |
5708 |
'train' |
9460 |
'validated' |
52488 |
'validation' |
4938 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
or
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/or')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
62 |
'other' |
4302 |
'test' |
98 |
'train' |
388 |
'validated' |
615 |
'validation' |
129 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
pa-IN
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/pa-IN')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
43 |
'other' |
1411 |
'test' |
116 |
'train' |
211 |
'validated' |
371 |
'validation' |
44 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
pl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/pl')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
4601 |
'other' |
12848 |
'test' |
5153 |
'train' |
7468 |
'validated' |
90791 |
'validation' |
5153 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/pt')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
1740 |
'other' |
8390 |
'test' |
4641 |
'train' |
6514 |
'validated' |
41584 |
'validation' |
4592 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
rm-sursilv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/rm-sursilv')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
639 |
'other' |
2102 |
'test' |
1194 |
'train' |
1384 |
'validated' |
3783 |
'validation' |
1205 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
rm-vallader
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/rm-vallader')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
374 |
'other' |
727 |
'test' |
378 |
'train' |
574 |
'validated' |
1316 |
'validation' |
357 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ro
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ro')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
485 |
'other' |
1945 |
'test' |
1778 |
'train' |
3399 |
'validated' |
6039 |
'validation' |
858 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ru')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
3056 |
'other' |
10247 |
'test' |
8007 |
'train' |
15481 |
'validated' |
74256 |
'validation' |
7963 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
rw
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/rw')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
206790 |
'other' |
22923 |
'test' |
15724 |
'train' |
515197 |
'validated' |
832929 |
'validation' |
15032 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sah
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/sah')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
66 |
'other' |
1275 |
'test' |
757 |
'train' |
1442 |
'validated' |
2606 |
'validation' |
405 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/sl')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
92 |
'other' |
2502 |
'test' |
881 |
'train' |
2038 |
'validated' |
4669 |
'validation' |
556 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sv-SE
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/sv-SE')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
462 |
'other' |
3043 |
'test' |
2027 |
'train' |
2331 |
'validated' |
12552 |
'validation' |
2019 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ta
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/ta')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
594 |
'other' |
7428 |
'test' |
1781 |
'train' |
2009 |
'validated' |
12652 |
'validation' |
1779 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
th
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/th')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
467 |
'other' |
2671 |
'test' |
2188 |
'train' |
2917 |
'validated' |
7028 |
'validation' |
1922 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/tr')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
1726 |
'other' |
325 |
'test' |
1647 |
'train' |
1831 |
'validated' |
18685 |
'validation' |
1647 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/tt')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
287 |
'other' |
1798 |
'test' |
4485 |
'train' |
11211 |
'validated' |
25781 |
'validation' |
2127 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
uk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/uk')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
1255 |
'other' |
8161 |
'test' |
3235 |
'train' |
4035 |
'validated' |
22337 |
'validation' |
3236 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
vi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/vi')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
78 |
'other' |
870 |
'test' |
198 |
'train' |
221 |
'validated' |
619 |
'validation' |
200 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
vot
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/vot')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
6 |
'other' |
411 |
'test' |
0 |
'train' |
3 |
'validated' |
3 |
'validation' |
0 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
zh-CN
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/zh-CN')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
5305 |
'other' |
8948 |
'test' |
8760 |
'train' |
18541 |
'validated' |
36405 |
'validation' |
8743 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
zh-HK
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/zh-HK')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
2999 |
'other' |
38830 |
'test' |
5172 |
'train' |
7506 |
'validated' |
41835 |
'validation' |
5172 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
zh-TW
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:common_voice/zh-TW')
- Description:
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
- License: https://github.com/common-voice/common-voice/blob/main/LICENSE
- Version: 6.1.0
- Splits:
Split | Examples |
---|---|
'invalidated' |
3584 |
'other' |
22477 |
'test' |
2895 |
'train' |
3507 |
'validated' |
61232 |
'validation' |
2895 |
- Features:
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"audio": {
"sampling_rate": 48000,
"mono": true,
"decode": true,
"id": null,
"_type": "Audio"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"up_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"down_votes": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"accent": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"locale": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"segment": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}