TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

opus100

References:

af-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/af-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	275512
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "af",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

am-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/am-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	89027
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "am",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

an-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/an-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'train'`	6961

Features:

{
    "translation": {
        "languages": [
            "an",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ar-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "ar",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

as-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/as-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	138479
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "as",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

az-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/az-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	262089
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "az",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

be-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/be-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	67312
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "be",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

bg-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/bg-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "bg",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

bn-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/bn-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "bn",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

br-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/br-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	153447
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "br",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

bs-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/bs-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "bs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ca-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ca-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "ca",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

cs-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/cs-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "cs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

cy-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/cy-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	289521
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "cy",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

da-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/da-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "da",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/de-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "de",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

dz-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/dz-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'train'`	624

Features:

{
    "translation": {
        "languages": [
            "dz",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

el-en

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/el-en')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "el",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-eo

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-eo')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	337106
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "eo"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-es

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-es')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "es"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-et

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-et')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "et"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-eu

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-eu')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "eu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fa

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-fa')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "fa"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fi

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-fi')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "fi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-fr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fy

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-fy')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	54342
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "fy"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ga

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ga')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	289524
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ga"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-gd

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-gd')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	1606
`'train'`	16316
`'validation'`	1605

Features:

{
    "translation": {
        "languages": [
            "en",
            "gd"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-gl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-gl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	515344
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "gl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-gu

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-gu')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	318306
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "gu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ha

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ha')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	97983
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ha"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-he

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-he')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "he"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-hi

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-hi')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	534319
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-hr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-hr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "hr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-hu

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-hu')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "hu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-hy

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-hy')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'train'`	7059

Features:

{
    "translation": {
        "languages": [
            "en",
            "hy"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-id

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-id')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "id"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ig

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ig')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	1843
`'train'`	18415
`'validation'`	1843

Features:

{
    "translation": {
        "languages": [
            "en",
            "ig"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-is

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-is')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "is"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-it

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-it')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "it"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ja

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ja')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ka

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ka')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	377306
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ka"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-kk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-kk')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	79927
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "kk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-km

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-km')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	111483
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "km"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ko

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ko')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-kn

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-kn')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	918
`'train'`	14537
`'validation'`	917

Features:

{
    "translation": {
        "languages": [
            "en",
            "kn"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ku

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ku')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	144844
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ku"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ky

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ky')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	27215
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ky"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-li

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-li')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	25535
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "li"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-lt

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-lt')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "lt"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-lv

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-lv')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "lv"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-mg

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-mg')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	590771
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "mg"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-mk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-mk')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "mk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ml

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ml')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	822746
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ml"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-mn

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-mn')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'train'`	4294

Features:

{
    "translation": {
        "languages": [
            "en",
            "mn"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-mr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-mr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	27007
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "mr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ms

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ms')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ms"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-mt

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-mt')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "mt"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-my

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-my')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	24594
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "my"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-nb

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-nb')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	142906
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "nb"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ne

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ne')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	406381
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ne"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-nl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-nl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "nl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-nn

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-nn')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	486055
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "nn"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-no

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-no')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "no"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-oc

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-oc')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	35791
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "oc"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-or

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-or')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	1318
`'train'`	14273
`'validation'`	1317

Features:

{
    "translation": {
        "languages": [
            "en",
            "or"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-pa

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-pa')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	107296
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "pa"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-pl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-pl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "pl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ps

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ps')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	79127
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ps"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-pt

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-pt')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "pt"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ro

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ro')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ro"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ru')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-rw

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-rw')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	173823
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "rw"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-se

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-se')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	35907
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "se"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-sh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	267211
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "sh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-si

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-si')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	979109
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "si"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-sk')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "sk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-sl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "sl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sq

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-sq')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "sq"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-sr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "sr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-sv

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-sv')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "sv"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ta

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ta')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	227014
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ta"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-te

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-te')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	64352
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "te"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-tg

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-tg')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	193882
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "tg"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-th

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-th')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "th"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-tk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-tk')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	1852
`'train'`	13110
`'validation'`	1852

Features:

{
    "translation": {
        "languages": [
            "en",
            "tk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-tr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-tr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "tr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-tt

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-tt')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	100843
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "tt"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ug

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ug')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	72170
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ug"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-uk

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-uk')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ur

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-ur')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	753913
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "ur"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-uz

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-uz')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	173157
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "uz"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-vi

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-vi')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "vi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-wa

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-wa')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	104496
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "wa"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-xh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-xh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	439671
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "xh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-yi

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-yi')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	15010
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "yi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-yo

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-yo')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'train'`	10375

Features:

{
    "translation": {
        "languages": [
            "en",
            "yo"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-zh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	1000000
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-zu

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/en-zu')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000
`'train'`	38616
`'validation'`	2000

Features:

{
    "translation": {
        "languages": [
            "en",
            "zu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-de

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ar-de')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "ar",
            "de"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ar-fr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "ar",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-nl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ar-nl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "ar",
            "nl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ar-ru')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "ar",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ar-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ar-zh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "ar",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/de-fr')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "de",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-nl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/de-nl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "de",
            "nl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/de-ru')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "de",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/de-zh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "de",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-nl

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/fr-nl')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "fr",
            "nl"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/fr-ru')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "fr",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/fr-zh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "fr",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl-ru

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/nl-ru')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "nl",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

nl-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/nl-zh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "nl",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ru-zh

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:opus100/ru-zh')

Description:

OPUS-100 is English-centric, meaning that all training pairs include English on either the source or target side.
The corpus covers 100 languages (including English).OPUS-100 contains approximately 55M sentence pairs.
Of the 99 language pairs, 44 have 1M sentence pairs of training data, 73 have at least 100k, and 95 have at least 10k.

License: No known license
Version: 0.0.0
Splits:

Split	Examples
`'test'`	2000

Features:

{
    "translation": {
        "languages": [
            "ru",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}