joya

Referencias:

mlsum_de

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/mlsum_de')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_covid' 5058
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 10695
'train' 220748
'validation' 11392
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "topic": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "date": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

mlsum_es

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/mlsum_es')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_covid' 1938
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 13366
'train' 259888
'validation' 9977
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "topic": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "date": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_es_en_v0

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_es_en_v0')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 19797
'train' 79515
'validation' 8835
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_ru_en_v0

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_ru_en_v0')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 9094
'train' 36898
'validation' 4100
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_tr_en_v0

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_tr_en_v0')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 808
'train' 3193
'validation' 355
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_vi_en_v0

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_vi_en_v0')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 2167
'train' 9206
'validation' 1023
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_arabe_ar

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_arabic_ar')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 5841
'train' 20441
'validation' 2919
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "ar",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "ar",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_chino_zh

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_chinese_zh')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 3775
'train' 13211
'validation' 1886
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "zh",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "zh",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_czech_cs

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_czech_cs')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 1438
'train' 5033
'validation' 718
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "cs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "cs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_holandés_nl

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_dutch_nl')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 6248
'train' 21866
'validation' 3123
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "nl",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "nl",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_english_en

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_english_en')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 28614
'train' 99020
'validation' 13823
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "en",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "en",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_french_fr

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_french_fr')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 12731
'train' 44556
'validation' 6364
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "fr",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "fr",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_german_de

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_german_de')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 11669
'train' 40839
'validation' 5833
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "de",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "de",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_hindi_hi

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_hindi_hi')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 1984
'train' 6942
'validation' 991
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "hi",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "hi",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_indonesian_id

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_indonesian_id')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 9497
'train' 33237
'validation' 4747
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "id",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "id",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_italian_it

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_italian_it')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 10189
'train' 35661
'validation' 5093
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "it",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "it",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_japonés_ja

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_japanese_ja')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 2530
'train' 8853
'validation' 1264
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "ja",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "ja",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_coreano_ko

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_korean_ko')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 2436
'train' 8524
'validation' 1216
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "ko",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "ko",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_portugués_pt

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_portuguese_pt')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 16331
'train' 57159
'validation' 8165
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "pt",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "pt",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_ruso_ru

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_russian_ru')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 10580
'train' 37028
'validation' 5288
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "ru",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "ru",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_espanol_es

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_spanish_es')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 22632
'train' 79212
'validation' 11316
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "es",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "es",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_thai_th

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_thai_th')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 2950
'train' 10325
'validation' 1475
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "th",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "th",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_turco_tr

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_turkish_tr')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 900
'train' 3148
'validation' 449
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "tr",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "tr",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

wiki_lingua_vietnamita_vi

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_lingua_vietnamese_vi')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 3917
'train' 13707
'validation' 1957
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_aligned": {
        "languages": [
            "vi",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "target_aligned": {
        "languages": [
            "vi",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

xsum

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/xsum')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_backtranslation' 500
'challenge_test_bfp_02' 500
'challenge_test_bfp_05' 500
'challenge_test_covid' 401
'challenge_test_nopunc' 500
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 1166
'train' 23206
'validation' 1117
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "xsum_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "document": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

common_gen

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/common_gen')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_scramble' 500
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 1497
'train' 67389
'validation' 993
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "concept_set_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "concepts": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

cs_restaurantes

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/cs_restaurants')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_scramble' 500
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 842
'train' 3569
'validation' 781
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialog_act": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialog_act_delexicalized": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target_delexicalized": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

dardo

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/dart')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'test' 5097
'train' 62659
'validation' 2768
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dart_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "tripleset": [
        [
            {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        ]
    ],
    "subtree_was_extended": {
        "dtype": "bool",
        "id": null,
        "_type": "Value"
    },
    "target_sources": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

e2e_nlg

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/e2e_nlg')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_scramble' 500
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 4693
'train' 33525
'validation' 4299
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "meaning_representation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

todo

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/totto')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_scramble' 500
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 7700
'train' 121153
'validation' 7700
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "totto_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "table_page_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "table_webpage_url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "table_section_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "table_section_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "table": [
        [
            {
                "column_span": {
                    "dtype": "int32",
                    "id": null,
                    "_type": "Value"
                },
                "is_header": {
                    "dtype": "bool",
                    "id": null,
                    "_type": "Value"
                },
                "row_span": {
                    "dtype": "int32",
                    "id": null,
                    "_type": "Value"
                },
                "value": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                }
            }
        ]
    ],
    "highlighted_cells": [
        [
            {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            }
        ]
    ],
    "example_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence_annotations": [
        {
            "original_sentence": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "sentence_after_deletion": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "sentence_after_ambiguity": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "final_sentence": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        }
    ],
    "overlap_subset": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

web_nlg_es

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/web_nlg_en')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_numbers' 500
'challenge_test_scramble' 500
'challenge_train_sample' 502
'challenge_validation_sample' 499
'test' 1779
'train' 35426
'validation' 1667
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "input": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "webnlg_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

web_nlg_ru

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/web_nlg_ru')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_scramble' 500
'challenge_train_sample' 501
'challenge_validation_sample' 500
'test' 1102
'train' 14630
'validation' 790
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "input": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "webnlg_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

wiki_auto_asset_turk

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/wiki_auto_asset_turk')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_asset_backtranslation' 359
'challenge_test_asset_bfp02' 359
'challenge_test_asset_bfp05' 359
'challenge_test_asset_nopunc' 359
'challenge_test_turk_backtranslation' 359
'challenge_test_turk_bfp02' 359
'challenge_test_turk_bfp05' 359
'challenge_test_turk_nopunc' 359
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test_asset' 359
'test_turk' 359
'train' 483801
'validation' 20000
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}

esquema_guided_dialog

Utilice el siguiente comando para cargar este conjunto de datos en TFDS:

ds = tfds.load('huggingface:gem/schema_guided_dialog')
  • Descripción :
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation,
both through human annotations and automated Metrics.

GEM aims to:
- measure NLG progress across 13 datasets spanning many NLG tasks and languages.
- provide an in-depth analysis of data and models presented via data statements and challenge sets.
- develop standards for evaluation of generated text using both automated and human metrics.

It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development
by extending existing data or developing datasets for additional languages.
  • Licencia : CC-BY-SA-4.0
  • Versión : 1.1.0
  • Divisiones :
Separar Ejemplos
'challenge_test_backtranslation' 500
'challenge_test_bfp02' 500
'challenge_test_bfp05' 500
'challenge_test_nopunc' 500
'challenge_test_scramble' 500
'challenge_train_sample' 500
'challenge_validation_sample' 500
'test' 10000
'train' 164982
'validation' 10000
  • Características :
{
    "gem_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gem_parent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "dialog_acts": [
        {
            "act": {
                "num_classes": 18,
                "names": [
                    "AFFIRM",
                    "AFFIRM_INTENT",
                    "CONFIRM",
                    "GOODBYE",
                    "INFORM",
                    "INFORM_COUNT",
                    "INFORM_INTENT",
                    "NEGATE",
                    "NEGATE_INTENT",
                    "NOTIFY_FAILURE",
                    "NOTIFY_SUCCESS",
                    "OFFER",
                    "OFFER_INTENT",
                    "REQUEST",
                    "REQUEST_ALTS",
                    "REQ_MORE",
                    "SELECT",
                    "THANK_YOU"
                ],
                "id": null,
                "_type": "ClassLabel"
            },
            "slot": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "values": [
                {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                }
            ]
        }
    ],
    "context": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ],
    "dialog_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "service": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "turn_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "prompt": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "target": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "references": [
        {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        }
    ]
}