References:
af-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/af-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
275512 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"af",
"en"
],
"id": null,
"_type": "Translation"
}
}
am-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/am-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
89027 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"am",
"en"
],
"id": null,
"_type": "Translation"
}
}
an-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/an-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'train' |
6961 |
- Features:
{
"translation": {
"languages": [
"an",
"en"
],
"id": null,
"_type": "Translation"
}
}
ar-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/ar-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"ar",
"en"
],
"id": null,
"_type": "Translation"
}
}
as-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/as-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
138479 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"as",
"en"
],
"id": null,
"_type": "Translation"
}
}
az-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/az-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
262089 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"az",
"en"
],
"id": null,
"_type": "Translation"
}
}
be-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/be-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
67312 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"be",
"en"
],
"id": null,
"_type": "Translation"
}
}
bg-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/bg-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"bg",
"en"
],
"id": null,
"_type": "Translation"
}
}
bn-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/bn-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"bn",
"en"
],
"id": null,
"_type": "Translation"
}
}
br-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/br-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
153447 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"br",
"en"
],
"id": null,
"_type": "Translation"
}
}
bs-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/bs-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"bs",
"en"
],
"id": null,
"_type": "Translation"
}
}
ca-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/ca-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"ca",
"en"
],
"id": null,
"_type": "Translation"
}
}
cs-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/cs-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"cs",
"en"
],
"id": null,
"_type": "Translation"
}
}
cy-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/cy-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
289521 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"cy",
"en"
],
"id": null,
"_type": "Translation"
}
}
da-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/da-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"da",
"en"
],
"id": null,
"_type": "Translation"
}
}
de-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/de-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"de",
"en"
],
"id": null,
"_type": "Translation"
}
}
dz-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/dz-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'train' |
624 |
- Features:
{
"translation": {
"languages": [
"dz",
"en"
],
"id": null,
"_type": "Translation"
}
}
el-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/el-en')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"el",
"en"
],
"id": null,
"_type": "Translation"
}
}
en-eo
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-eo')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
337106 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"eo"
],
"id": null,
"_type": "Translation"
}
}
en-es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-es')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"es"
],
"id": null,
"_type": "Translation"
}
}
en-et
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-et')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"et"
],
"id": null,
"_type": "Translation"
}
}
en-eu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-eu')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"eu"
],
"id": null,
"_type": "Translation"
}
}
en-fa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-fa')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"fa"
],
"id": null,
"_type": "Translation"
}
}
en-fi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-fi')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"fi"
],
"id": null,
"_type": "Translation"
}
}
en-fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-fr')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"fr"
],
"id": null,
"_type": "Translation"
}
}
en-fy
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-fy')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
54342 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"fy"
],
"id": null,
"_type": "Translation"
}
}
en-ga
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ga')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
289524 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ga"
],
"id": null,
"_type": "Translation"
}
}
en-gd
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-gd')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
1606 |
'train' |
16316 |
'validation' |
1605 |
- Features:
{
"translation": {
"languages": [
"en",
"gd"
],
"id": null,
"_type": "Translation"
}
}
en-gl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-gl')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
515344 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"gl"
],
"id": null,
"_type": "Translation"
}
}
en-gu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-gu')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
318306 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"gu"
],
"id": null,
"_type": "Translation"
}
}
en-ha
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ha')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
97983 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ha"
],
"id": null,
"_type": "Translation"
}
}
en-he
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-he')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"he"
],
"id": null,
"_type": "Translation"
}
}
en-hi
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-hi')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
534319 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"hi"
],
"id": null,
"_type": "Translation"
}
}
en-hr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-hr')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"hr"
],
"id": null,
"_type": "Translation"
}
}
en-hu
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-hu')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"hu"
],
"id": null,
"_type": "Translation"
}
}
en-hy
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-hy')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'train' |
7059 |
- Features:
{
"translation": {
"languages": [
"en",
"hy"
],
"id": null,
"_type": "Translation"
}
}
en-id
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-id')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"id"
],
"id": null,
"_type": "Translation"
}
}
en-ig
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ig')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
1843 |
'train' |
18415 |
'validation' |
1843 |
- Features:
{
"translation": {
"languages": [
"en",
"ig"
],
"id": null,
"_type": "Translation"
}
}
en-is
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-is')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"is"
],
"id": null,
"_type": "Translation"
}
}
en-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-it')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"it"
],
"id": null,
"_type": "Translation"
}
}
en-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ja')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ja"
],
"id": null,
"_type": "Translation"
}
}
en-ka
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ka')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
377306 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ka"
],
"id": null,
"_type": "Translation"
}
}
en-kk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-kk')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
79927 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"kk"
],
"id": null,
"_type": "Translation"
}
}
en-km
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-km')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
111483 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"km"
],
"id": null,
"_type": "Translation"
}
}
en-ko
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ko')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ko"
],
"id": null,
"_type": "Translation"
}
}
en-kn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-kn')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
918 |
'train' |
14537 |
'validation' |
917 |
- Features:
{
"translation": {
"languages": [
"en",
"kn"
],
"id": null,
"_type": "Translation"
}
}
en-ku
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ku')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
144844 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ku"
],
"id": null,
"_type": "Translation"
}
}
en-ky
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ky')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
27215 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ky"
],
"id": null,
"_type": "Translation"
}
}
en-li
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-li')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
25535 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"li"
],
"id": null,
"_type": "Translation"
}
}
en-lt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-lt')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"lt"
],
"id": null,
"_type": "Translation"
}
}
en-lv
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-lv')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"lv"
],
"id": null,
"_type": "Translation"
}
}
en-mg
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-mg')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
590771 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"mg"
],
"id": null,
"_type": "Translation"
}
}
en-mk
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-mk')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"mk"
],
"id": null,
"_type": "Translation"
}
}
en-ml
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ml')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
822746 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ml"
],
"id": null,
"_type": "Translation"
}
}
en-mn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-mn')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'train' |
4294 |
- Features:
{
"translation": {
"languages": [
"en",
"mn"
],
"id": null,
"_type": "Translation"
}
}
en-mr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-mr')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
27007 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"mr"
],
"id": null,
"_type": "Translation"
}
}
en-ms
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ms')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ms"
],
"id": null,
"_type": "Translation"
}
}
en-mt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-mt')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"mt"
],
"id": null,
"_type": "Translation"
}
}
en-my
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-my')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
24594 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"my"
],
"id": null,
"_type": "Translation"
}
}
en-nb
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-nb')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
142906 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"nb"
],
"id": null,
"_type": "Translation"
}
}
en-ne
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ne')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
406381 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"ne"
],
"id": null,
"_type": "Translation"
}
}
en-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-nl')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"nl"
],
"id": null,
"_type": "Translation"
}
}
en-nn
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-nn')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
486055 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"nn"
],
"id": null,
"_type": "Translation"
}
}
en-no
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-no')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"no"
],
"id": null,
"_type": "Translation"
}
}
en-oc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-oc')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
35791 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"oc"
],
"id": null,
"_type": "Translation"
}
}
en-or
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-or')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
1318 |
'train' |
14273 |
'validation' |
1317 |
- Features:
{
"translation": {
"languages": [
"en",
"or"
],
"id": null,
"_type": "Translation"
}
}
en-pa
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-pa')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
107296 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"pa"
],
"id": null,
"_type": "Translation"
}
}
en-pl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-pl')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
1000000 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",
"pl"
],
"id": null,
"_type": "Translation"
}
}
en-ps
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:opus100/en-ps')
- Description:
OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.
- License: No known license
- Version: 0.0.0
- Splits:
Split | Examples |
---|---|
'test' |
2000 |
'train' |
79127 |
'validation' |
2000 |
- Features:
{
"translation": {
"languages": [
"en",