TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

giga_fren

References:

en-fr

Use the following command to load this dataset in TFDS:

ds = tfds.load('huggingface:giga_fren/en-fr')

Description:

Giga-word corpus for French-English from WMT2010 collected by Chris Callison-Burch
2 languages, total number of files: 452
total number of tokens: 1.43G
total number of sentence fragments: 47.55M

License: No known license
Version: 2.0.0
Splits:

Split	Examples
`'train'`	22519904

Features:

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-06-28 UTC.