קול משותף

הפניות:

אב

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ab')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 8
'other' 752
'test' 9
'train' 22
'validated' 31
'validation' 0
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ar

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ar')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 6333
'other' 18283
'test' 7622
'train' 14227
'validated' 43291
'validation' 7517
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

כפי ש

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/as')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 31
'other' 0
'test' 110
'train' 270
'validated' 504
'validation' 124
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

br

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/br')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 623
'other' 10912
'test' 2087
'train' 2780
'validated' 8560
'validation' 1997
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

כ

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ca')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 18846
'other' 64446
'test' 15724
'train' 285584
'validated' 416701
'validation' 15724
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cnh

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/cnh')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 433
'other' 2934
'test' 752
'train' 807
'validated' 2432
'validation' 756
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cs

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/cs')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 685
'other' 7475
'test' 4144
'train' 5655
'validated' 30431
'validation' 4118
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

קו"ח

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/cv')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 1282
'other' 6927
'test' 788
'train' 931
'validated' 3496
'validation' 818
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cy

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/cy')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 3648
'other' 17919
'test' 4820
'train' 6839
'validated' 72984
'validation' 4776
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

דה

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/de')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 32789
'other' 10095
'test' 15588
'train' 246525
'validated' 565186
'validation' 15588
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

dv

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/dv')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 840
'other' 0
'test' 2202
'train' 2680
'validated' 11866
'validation' 2077
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

אל

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/el')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 185
'other' 5659
'test' 1522
'train' 2316
'validated' 5996
'validation' 1401
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

he

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/en')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 189562
'other' 169895
'test' 16164
'train' 564337
'validated' 1224864
'validation' 16164
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

eo

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/eo')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 4736
'other' 2946
'test' 8969
'train' 19587
'validated' 58094
'validation' 8987
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

es

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/es')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 40640
'other' 144791
'test' 15089
'train' 161813
'validated' 236314
'validation' 15089
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

et

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/et')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 3557
'other' 569
'test' 2509
'train' 2966
'validated' 10683
'validation' 2507
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

אירופה

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/eu')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 5387
'other' 23570
'test' 5172
'train' 7505
'validated' 63009
'validation' 5172
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fa

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/fa')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 11698
'other' 22510
'test' 5213
'train' 7593
'validated' 251659
'validation' 5213
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fi

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/fi')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 59
'other' 149
'test' 428
'train' 460
'validated' 1305
'validation' 415
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fr

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/fr')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 40351
'other' 3222
'test' 15763
'train' 298982
'validated' 461004
'validation' 15763
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fy-NL

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/fy-NL')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 1031
'other' 21569
'test' 3020
'train' 3927
'validated' 10495
'validation' 2790
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ga-IE

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ga-IE')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 409
'other' 2130
'test' 506
'train' 541
'validated' 3352
'validation' 497
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

היי

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/hi')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 60
'other' 139
'test' 127
'train' 157
'validated' 419
'validation' 135
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

hsb

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/hsb')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 227
'other' 62
'test' 387
'train' 808
'validated' 1367
'validation' 172
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

hu

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/hu')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 169
'other' 295
'test' 1649
'train' 3348
'validated' 6457
'validation' 1434
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ia

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ia')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 192
'other' 1095
'test' 899
'train' 3477
'validated' 5978
'validation' 1601
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

תְעוּדַת זֶהוּת

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/id')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 470
'other' 6782
'test' 1844
'train' 2130
'validated' 8696
'validation' 1835
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

זה

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/it')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 12189
'other' 14549
'test' 12928
'train' 58015
'validated' 102579
'validation' 12928
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

כן

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ja')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 504
'other' 885
'test' 632
'train' 722
'validated' 3072
'validation' 586
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ka

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ka')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 139
'other' 44
'test' 656
'train' 1058
'validated' 2275
'validation' 527
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

kab

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/kab')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 18134
'other' 88021
'test' 14622
'train' 120530
'validated' 573718
'validation' 14622
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ky

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ky')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 926
'other' 7223
'test' 1503
'train' 1955
'validated' 9236
'validation' 1511
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

lg

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/lg')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 290
'other' 3110
'test' 584
'train' 1250
'validated' 2220
'validation' 384
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

לט

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/lt')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 102
'other' 1629
'test' 466
'train' 931
'validated' 1644
'validation' 244
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

lv

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/lv')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 143
'other' 1560
'test' 1882
'train' 2552
'validated' 6444
'validation' 2002
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

מנ

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/mn')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 667
'other' 3272
'test' 1862
'train' 2183
'validated' 7487
'validation' 1837
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

הר

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/mt')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 314
'other' 5714
'test' 1617
'train' 2036
'validated' 5747
'validation' 1516
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

nl

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/nl')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 3308
'other' 27
'test' 5708
'train' 9460
'validated' 52488
'validation' 4938
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

אוֹ

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/or')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 62
'other' 4302
'test' 98
'train' 388
'validated' 615
'validation' 129
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

כְּאֵב

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/pa-IN')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 43
'other' 1411
'test' 116
'train' 211
'validated' 371
'validation' 44
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

pl

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/pl')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 4601
'other' 12848
'test' 5153
'train' 7468
'validated' 90791
'validation' 5153
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

pt

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/pt')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 1740
'other' 8390
'test' 4641
'train' 6514
'validated' 41584
'validation' 4592
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

rm-sursilv

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/rm-sursilv')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 639
'other' 2102
'test' 1194
'train' 1384
'validated' 3783
'validation' 1205
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

rm-vallader

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/rm-vallader')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 374
'other' 727
'test' 378
'train' 574
'validated' 1316
'validation' 357
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ro

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ro')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 485
'other' 1945
'test' 1778
'train' 3399
'validated' 6039
'validation' 858
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ru

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ru')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 3056
'other' 10247
'test' 8007
'train' 15481
'validated' 74256
'validation' 7963
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

rw

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/rw')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 206790
'other' 22923
'test' 15724
'train' 515197
'validated' 832929
'validation' 15032
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

סאה

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/sah')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 66
'other' 1275
'test' 757
'train' 1442
'validated' 2606
'validation' 405
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

sl

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/sl')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 92
'other' 2502
'test' 881
'train' 2038
'validated' 4669
'validation' 556
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

sv-SE

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/sv-SE')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 462
'other' 3043
'test' 2027
'train' 2331
'validated' 12552
'validation' 2019
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ta

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/ta')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 594
'other' 7428
'test' 1781
'train' 2009
'validated' 12652
'validation' 1779
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ה'

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/th')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 467
'other' 2671
'test' 2188
'train' 2917
'validated' 7028
'validation' 1922
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

tr

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/tr')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 1726
'other' 325
'test' 1647
'train' 1831
'validated' 18685
'validation' 1647
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

tt

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/tt')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 287
'other' 1798
'test' 4485
'train' 11211
'validated' 25781
'validation' 2127
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

בְּרִיטַנִיָה

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/uk')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 1255
'other' 8161
'test' 3235
'train' 4035
'validated' 22337
'validation' 3236
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

vi

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/vi')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 78
'other' 870
'test' 198
'train' 221
'validated' 619
'validation' 200
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

הצבעה

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/vot')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 6
'other' 411
'test' 0
'train' 3
'validated' 3
'validation' 0
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

zh-CN

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/zh-CN')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 5305
'other' 8948
'test' 8760
'train' 18541
'validated' 36405
'validation' 8743
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

zh-HK

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/zh-HK')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 2999
'other' 38830
'test' 5172
'train' 7506
'validated' 41835
'validation' 5172
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

zh-TW

השתמש בפקודה הבאה כדי לטעון מערך נתונים זה ב-TFDS:

ds = tfds.load('huggingface:common_voice/zh-TW')
  • תיאור :
Common Voice is Mozilla's initiative to help teach machines how real people speak.
The dataset currently consists of 7,335 validated hours of speech in 60 languages, but we’re always adding more voices and languages.
לְפַצֵל דוגמאות
'invalidated' 3584
'other' 22477
'test' 2895
'train' 3507
'validated' 61232
'validation' 2895
  • מאפיינים :
{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "path": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 48000,
        "mono": true,
        "decode": true,
        "id": null,
        "_type": "Audio"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "up_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "down_votes": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "age": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gender": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "accent": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "locale": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "segment": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}