टीएफडीएस अब क्रोइसैन 🥐 प्रारूप का समर्थन करता है! अधिक जानने के लिए दस्तावेज़ पढ़ें.

इस पेज का अनुवाद Cloud Translation API से किया गया है.

रत्न

विवरण :

जीईएम मानव एनोटेशन और स्वचालित मेट्रिक्स दोनों के माध्यम से इसके मूल्यांकन पर ध्यान देने के साथ प्राकृतिक भाषा निर्माण के लिए एक बेंचमार्क वातावरण है।

GEM का लक्ष्य है: (1) कई NLG कार्यों और भाषाओं में फैले 13 डेटासेट में NLG की प्रगति को मापना। (2) डेटा स्टेटमेंट और चैलेंज सेट के माध्यम से प्रस्तुत डेटा और मॉडल का गहन विश्लेषण प्रदान करें। (3) स्वचालित और मानव मैट्रिक्स दोनों का उपयोग करके उत्पन्न पाठ के मूल्यांकन के लिए मानक विकसित करना।

अधिक जानकारी https://gem-benchmark.com पर मिल सकती है।

अतिरिक्त दस्तावेज़ीकरण : कोड वाले पेपर्स पर एक्सप्लोर करें
होमपेज : https://gem-benchmark.com
स्रोत कोड : tfds.text.gem.Gem
संस्करण :
- 1.0.0 : प्रारंभिक संस्करण
- 1.0.1 : MLSum के लिए ख़राब लिंक फ़िल्टर अपडेट करें
- 1.1.0 (डिफ़ॉल्ट): चैलेंज सेट की रिलीज़
पर्यवेक्षित कुंजियाँ ( as_supervised doc देखें): None
चित्र ( tfds.show_examples ): समर्थित नहीं है।

मणि/common_gen (डिफ़ॉल्ट कॉन्फ़िगरेशन)

कॉन्फिग विवरण : कॉमनजेन एक विवश पाठ निर्माण कार्य है, जो बेंचमार्क डेटासेट से जुड़ा है, स्पष्ट रूप से जनरेटिव कॉमन्सेंस रीजनिंग की क्षमता के लिए मशीनों का परीक्षण करता है। सामान्य अवधारणाओं के एक सेट को देखते हुए; कार्य इन अवधारणाओं का उपयोग करके रोजमर्रा के परिदृश्य का वर्णन करने वाला एक सुसंगत वाक्य उत्पन्न करना है।
डाउनलोड आकार : 1.84 MiB
डेटासेट का आकार : 16.84 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	1,497
`'train'`	67,389
`'validation'`	993

फ़ीचर संरचना :

FeaturesDict({
    'concept_set_id': int32,
    'concepts': Sequence(string),
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
अवधारणा_सेट_आईडी	टेन्सर		int32
अवधारणाओं	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{lin2020commongen,
  title = "CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning",
  author = "Lin, Bill Yuchen  and
    Zhou, Wangchunshu  and
    Shen, Ming  and
    Zhou, Pei  and
    Bhagavatula, Chandra  and
    Choi, Yejin  and
    Ren, Xiang",
  booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
  month = nov,
  year = "2020",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2020.findings-emnlp.165",
  pages = "1823--1840",
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

जेम/cs_restaurants

कॉन्फिग विवरण : कार्य एक (काल्पनिक) संवाद प्रणाली के संदर्भ में प्रतिक्रिया उत्पन्न कर रहा है जो रेस्तरां के बारे में जानकारी प्रदान करता है। इनपुट एक मूल आशय/संवाद अधिनियम प्रकार और स्लॉट्स (विशेषताओं) और उनके मूल्यों की एक सूची है। आउटपुट एक प्राकृतिक भाषा वाक्य है।
डाउनलोड आकार : 1.46 MiB
डेटासेट का आकार : 2.71 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	842
`'train'`	3,569
`'validation'`	781

फ़ीचर संरचना :

FeaturesDict({
    'dialog_act': string,
    'dialog_act_delexicalized': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'target_delexicalized': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
डायलॉग_एक्ट	टेन्सर		डोरी
dialo_act_delexicalized	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी
target_delexicalized	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{cs_restaurants,
  address = {Tokyo, Japan},
  title = {Neural {Generation} for {Czech}: {Data} and {Baselines} },
  shorttitle = {Neural {Generation} for {Czech} },
  url = {https://www.aclweb.org/anthology/W19-8670/},
  urldate = {2019-10-18},
  booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
  author = {Dušek, Ondřej and Jurčíček, Filip},
  month = oct,
  year = {2019},
  pages = {563--574}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि / डार्ट

कॉन्फिग विवरण : DART उच्च गुणवत्ता वाले वाक्य एनोटेशन के साथ एक बड़ा और ओपन-डोमेन संरचित डेटा रिकॉर्ड टू टेक्स्ट जनरेशन कॉर्पस है, जिसमें प्रत्येक इनपुट ट्री-स्ट्रक्चर्ड ऑन्कोलॉजी के बाद एंटिटी-रिलेशन ट्रिपल का एक सेट है।
डाउनलोड आकार : 28.01 MiB
डेटासेट का आकार : 33.78 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	6,959
`'train'`	62,659
`'validation'`	2,768

फ़ीचर संरचना :

FeaturesDict({
    'dart_id': int32,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'subtree_was_extended': bool,
    'target': string,
    'target_sources': Sequence(string),
    'tripleset': Sequence(string),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
dart_id	टेन्सर		int32
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
subtree_was_extended	टेन्सर		बूल
लक्ष्य	टेन्सर		डोरी
target_sources	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
tripleset	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@article{radev2020dart,
  title=Dart: Open-domain structured data record to text generation,
  author={Radev, Dragomir and Zhang, Rui and Rau, Amrit and Sivaprasad, Abhinand and Hsieh, Chiachun and Rajani, Nazneen Fatema and Tang, Xiangru and Vyas, Aadit and Verma, Neha and Krishna, Pranav and others},
  journal={arXiv preprint arXiv:2007.02871},
  year={2020}
}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/e2e_nlg

कॉन्फ़िगरेशन विवरण : E2E डेटासेट को एक सीमित-डोमेन डेटा-टू-टेक्स्ट कार्य के लिए डिज़ाइन किया गया है - 8 अलग-अलग विशेषताओं (नाम, क्षेत्र, मूल्य सीमा आदि) के आधार पर रेस्तरां विवरण/सिफारिशें तैयार करना।
डाउनलोड आकार : 13.99 MiB
डेटासेट का आकार : 16.92 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	4,693
`'train'`	33,525
`'validation'`	4,299

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'meaning_representation': string,
    'references': Sequence(string),
    'target': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
meaning_representation	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{e2e_cleaned,
  address = {Tokyo, Japan},
  title = {Semantic {Noise} {Matters} for {Neural} {Natural} {Language} {Generation} },
  url = {https://www.aclweb.org/anthology/W19-8652/},
  booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
  author = {Dušek, Ondřej and Howcroft, David M and Rieser, Verena},
  year = {2019},
  pages = {421--426},
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/mlsum_de

कॉन्फ़िग विवरण : MLSum एक बड़े पैमाने पर बहुभाषी सारांश डेटासेट है। यह ऑनलाइन समाचार आउटलेट्स से निर्मित है, यह विभाजन जर्मन पर केंद्रित है।
डाउनलोड आकार : 345.98 MiB
डेटासेट का आकार : 963.60 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_covid'`	5,058
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	10,695
`'train'`	220,748
`'validation'`	11,392

फ़ीचर संरचना :

FeaturesDict({
    'date': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'text': string,
    'title': string,
    'topic': string,
    'url': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
दिनांक	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी
मूलपाठ	टेन्सर		डोरी
शीर्षक	टेन्सर		डोरी
विषय	टेन्सर		डोरी
यूआरएल	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{scialom-etal-2020-mlsum,
    title = "{MLSUM}: The Multilingual Summarization Corpus",
    author = {Scialom, Thomas  and Dray, Paul-Alexis  and Lamprier, Sylvain  and Piwowarski, Benjamin  and Staiano, Jacopo},
    booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year = {2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/mlsum_es

कॉन्फ़िग विवरण : MLSum एक बड़े पैमाने पर बहुभाषी सारांश डेटासेट है। यह ऑनलाइन समाचार आउटलेट्स से निर्मित है, यह विभाजन स्पेनिश पर केंद्रित है।
डाउनलोड आकार : 501.27 MiB
डेटासेट का आकार : 1.29 GiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_covid'`	1,938
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	13,366
`'train'`	259,888
`'validation'`	9,977

फ़ीचर संरचना :

FeaturesDict({
    'date': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'text': string,
    'title': string,
    'topic': string,
    'url': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
दिनांक	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी
मूलपाठ	टेन्सर		डोरी
शीर्षक	टेन्सर		डोरी
विषय	टेन्सर		डोरी
यूआरएल	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{scialom-etal-2020-mlsum,
    title = "{MLSUM}: The Multilingual Summarization Corpus",
    author = {Scialom, Thomas  and Dray, Paul-Alexis  and Lamprier, Sylvain  and Piwowarski, Benjamin  and Staiano, Jacopo},
    booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year = {2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/स्कीमा_गाइडेड_डायलॉग

कॉन्फ़िगरेशन विवरण : स्कीमा-गाइडेड डायलॉग (SGD) डेटासेट में एक मानव और एक आभासी सहायक के बीच 18K बहु-डोमेन कार्य-उन्मुख संवाद होते हैं, जिसमें बैंकों और घटनाओं से लेकर मीडिया, कैलेंडर, यात्रा और मौसम तक के 17 डोमेन शामिल होते हैं।
डाउनलोड आकार : 17.00 MiB
डेटासेट का आकार : 201.19 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हां (challenge_test_backtranslation, Challenge_test_bfp02, Challenge_test_bfp05, Challenge_test_nopunc, Challenge_test_scramble, Challenge_train_sample, Challenge_validation_sample, परीक्षण, सत्यापन), केवल जब shuffle_files=False (ट्रेन)
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_backtranslation'`	500
`'challenge_test_bfp02'`	500
`'challenge_test_bfp05'`	500
`'challenge_test_nopunc'`	500
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	10,000
`'train'`	164,982
`'validation'`	10,000

फ़ीचर संरचना :

FeaturesDict({
    'context': Sequence(string),
    'dialog_acts': Sequence({
        'act': ClassLabel(shape=(), dtype=int64, num_classes=18),
        'slot': string,
        'values': Sequence(string),
    }),
    'dialog_id': string,
    'gem_id': string,
    'gem_parent_id': string,
    'prompt': string,
    'references': Sequence(string),
    'service': string,
    'target': string,
    'turn_id': int32,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
dial_acts	क्रम
डायलॉग_एक्ट्स/एक्ट	क्लासलेबल		int64
डायलॉग_एक्ट्स/स्लॉट	टेन्सर		डोरी
डायलॉग_एक्ट्स/वैल्यू	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
डायलॉग_आईडी	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
तत्पर	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
सर्विस	टेन्सर		डोरी
लक्ष्य	टेन्सर		डोरी
टर्न_आईडी	टेन्सर		int32

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@article{rastogi2019towards,
  title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},
  author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
  journal={arXiv preprint arXiv:1909.05855},
  year={2019}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

रत्न / टोटो

कॉन्फ़िग विवरण : ToTTo एक टेबल-टू-टेक्स्ट NLG कार्य है। कार्य इस प्रकार है: पंक्ति नाम, स्तंभ नाम और तालिका कक्षों के साथ एक विकिपीडिया तालिका दी गई है, जिसमें कोशिकाओं का एक सबसेट हाइलाइट किया गया है, तालिका के हाइलाइट किए गए भाग के लिए एक प्राकृतिक भाषा विवरण तैयार करें।
डाउनलोड आकार : 180.75 MiB
डेटासेट का आकार : 645.86 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	7,700
`'train'`	121,153
`'validation'`	7,700

फ़ीचर संरचना :

FeaturesDict({
    'example_id': string,
    'gem_id': string,
    'gem_parent_id': string,
    'highlighted_cells': Sequence(Sequence(int32)),
    'overlap_subset': string,
    'references': Sequence(string),
    'sentence_annotations': Sequence({
        'final_sentence': string,
        'original_sentence': string,
        'sentence_after_ambiguity': string,
        'sentence_after_deletion': string,
    }),
    'table': Sequence(Sequence({
        'column_span': int32,
        'is_header': bool,
        'row_span': int32,
        'value': string,
    })),
    'table_page_title': string,
    'table_section_text': string,
    'table_section_title': string,
    'table_webpage_url': string,
    'target': string,
    'totto_id': int32,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
example_id	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
Highlight_cells	अनुक्रम (अनुक्रम (टेंसर))	(कोई नहीं, कोई नहीं)	int32
ओवरलैप_उपसेट	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
वाक्य_टिप्पणियाँ	क्रम
वाक्य_एनोटेशन/अंतिम_वाक्य	टेन्सर		डोरी
वाक्य_टिप्पणियां/मूल_वाक्य	टेन्सर		डोरी
वाक्य_टिप्पणियां/वाक्य_बाद_अस्पष्टता	टेन्सर		डोरी
वाक्य_टिप्पणी/वाक्य_बाद_हटाना	टेन्सर		डोरी
मेज़	क्रम
टेबल/कॉलम_स्पैन	टेन्सर		int32
टेबल/is_header	टेन्सर		बूल
टेबल/row_span	टेन्सर		int32
तालिका / मूल्य	टेन्सर		डोरी
टेबल_पेज_टाइटल	टेन्सर		डोरी
टेबल_सेक्शन_टेक्स्ट	टेन्सर		डोरी
टेबल_सेक्शन_टाइटल	टेन्सर		डोरी
टेबल_वेबपेज_url	टेन्सर		डोरी
लक्ष्य	टेन्सर		डोरी
totto_id	टेन्सर		int32

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{parikh2020totto,
  title=ToTTo: A Controlled Table-To-Text Generation Dataset,
  author={Parikh, Ankur and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={1173--1186},
  year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

रत्न/web_nlg_en

कॉन्फिग विवरण : वेबएनएलजी समानांतर डीबीपीडिया ट्रिपल सेट और छोटे टेक्स्ट का एक द्विभाषी डेटासेट (अंग्रेजी, रूसी) है जो लगभग 450 विभिन्न डीबीपीडिया गुणों को कवर करता है। वेबएनएलजी डेटा मूल रूप से लघु पाठ उत्पन्न करने और माइक्रो-प्लानिंग को संभालने में सक्षम आरडीएफ वर्बलाइजर्स के विकास को बढ़ावा देने के लिए बनाया गया था।
डाउनलोड आकार : 12.57 MiB
डेटासेट का आकार : 19.91 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_numbers'`	500
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	502
`'challenge_validation_sample'`	499
`'test'`	1,779
`'train'`	35,426
`'validation'`	1,667

फ़ीचर संरचना :

FeaturesDict({
    'category': string,
    'gem_id': string,
    'gem_parent_id': string,
    'input': Sequence(string),
    'references': Sequence(string),
    'target': string,
    'webnlg_id': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
श्रेणी	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
इनपुट	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी
webnlg_id	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{gardent2017creating,
  author = "Gardent, Claire
    and Shimorina, Anastasia
    and Narayan, Shashi
    and Perez-Beltrachini, Laura",
  title = "Creating Training Corpora for NLG Micro-Planners",
  booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = "2017",
  publisher = "Association for Computational Linguistics",
  pages = "179--188",
  location = "Vancouver, Canada",
  doi = "10.18653/v1/P17-1017",
  url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/web_nlg_ru

कॉन्फिग विवरण : वेबएनएलजी समानांतर डीबीपीडिया ट्रिपल सेट और छोटे टेक्स्ट का एक द्विभाषी डेटासेट (अंग्रेजी, रूसी) है जो लगभग 450 विभिन्न डीबीपीडिया गुणों को कवर करता है। वेबएनएलजी डेटा मूल रूप से लघु पाठ उत्पन्न करने और माइक्रो-प्लानिंग को संभालने में सक्षम आरडीएफ वर्बलाइजर्स के विकास को बढ़ावा देने के लिए बनाया गया था।
डाउनलोड आकार : 7.49 MiB
डेटासेट का आकार : 11.30 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_scramble'`	500
`'challenge_train_sample'`	501
`'challenge_validation_sample'`	500
`'test'`	1,102
`'train'`	14,630
`'validation'`	790

फ़ीचर संरचना :

FeaturesDict({
    'category': string,
    'gem_id': string,
    'gem_parent_id': string,
    'input': Sequence(string),
    'references': Sequence(string),
    'target': string,
    'webnlg_id': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
श्रेणी	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
इनपुट	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी
webnlg_id	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{gardent2017creating,
  author = "Gardent, Claire
    and Shimorina, Anastasia
    and Narayan, Shashi
    and Perez-Beltrachini, Laura",
  title = "Creating Training Corpora for NLG Micro-Planners",
  booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = "2017",
  publisher = "Association for Computational Linguistics",
  pages = "179--188",
  location = "Vancouver, Canada",
  doi = "10.18653/v1/P17-1017",
  url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_auto_asset_turk

कॉन्फिग विवरण : विकीऑटो वाक्य सरलीकरण प्रणालियों को प्रशिक्षित करने के लिए एक संसाधन के रूप में अंग्रेजी विकिपीडिया और सरल अंग्रेजी विकिपीडिया से संरेखित वाक्यों का एक सेट प्रदान करता है। ASSET और TURK उच्च गुणवत्ता वाले सरलीकरण डेटासेट हैं जिनका उपयोग परीक्षण के लिए किया जाता है।
डाउनलोड आकार : 121.01 MiB
डेटासेट का आकार : 202.40 MiB
Auto-cached ( documentation ): Yes (challenge_test_asset_backtranslation, challenge_test_asset_bfp02, challenge_test_asset_bfp05, challenge_test_asset_nopunc, challenge_test_turk_backtranslation, challenge_test_turk_bfp02, challenge_test_turk_bfp05, challenge_test_turk_nopunc, challenge_train_sample, challenge_validation_sample, test_asset, test_turk, validation), Only when shuffle_files=False (train)
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_asset_backtranslation'`	359
`'challenge_test_asset_bfp02'`	359
`'challenge_test_asset_bfp05'`	359
`'challenge_test_asset_nopunc'`	359
`'challenge_test_turk_backtranslation'`	359
`'challenge_test_turk_bfp02'`	359
`'challenge_test_turk_bfp05'`	359
`'challenge_test_turk_nopunc'`	359
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test_asset'`	359
`'test_turk'`	359
`'train'`	483,801
`'validation'`	20,000

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'target': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
लक्ष्य	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{jiang-etal-2020-neural,
    title = "Neural {CRF} Model for Sentence Alignment in Text Simplification",
    author = "Jiang, Chao  and
      Maddela, Mounica  and
      Lan, Wuwei  and
      Zhong, Yang  and
      Xu, Wei",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.709",
    doi = "10.18653/v1/2020.acl-main.709",
    pages = "7943--7960",
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/xsum

कॉन्फ़िग विवरण : डेटासेट अपने चरम रूप में अमूर्त संक्षेपण के कार्य के लिए है, यह एक दस्तावेज़ को एक वाक्य में सारांशित करने के बारे में है।
डाउनलोड का आकार : 246.31 MiB
डेटासेट का आकार : 78.89 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'challenge_test_backtranslation'`	500
`'challenge_test_bfp_02'`	500
`'challenge_test_bfp_05'`	500
`'challenge_test_covid'`	401
`'challenge_test_nopunc'`	500
`'challenge_train_sample'`	500
`'challenge_validation_sample'`	500
`'test'`	1,166
`'train'`	23,206
`'validation'`	1,117

फ़ीचर संरचना :

FeaturesDict({
    'document': string,
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'target': string,
    'xsum_id': string,
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
दस्तावेज़	टेन्सर		डोरी
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
लक्ष्य	टेन्सर		डोरी
xsum_id	टेन्सर		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{Narayan2018dont,
  author = "Shashi Narayan and Shay B. Cohen and Mirella Lapata",
  title = "Don't Give Me the Details, Just the Summary! {T}opic-Aware Convolutional Neural Networks for Extreme Summarization",
  booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ",
  year = "2018",
  address = "Brussels, Belgium",
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_arabic_ar

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 56.25 MiB
डेटासेट का आकार : 291.42 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	5,841
`'train'`	20,441
`'validation'`	2,919

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'ar': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'ar': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/एआर	मूलपाठ		डोरी
स्रोत_संरेखित/hi	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
target_aligned/ar	मूलपाठ		डोरी
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_chinese_zh

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 31.38 MiB
डेटासेट का आकार : 122.06 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	3,775
`'train'`	13,211
`'validation'`	1,886

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'zh': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'zh': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_गठबंधन/zh	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/zh	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_czech_cs

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 13.84 MiB
डेटासेट का आकार : 58.05 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	1,438
`'train'`	5,033
`'validation'`	718

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'cs': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'cs': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/cs	मूलपाठ		डोरी
स्रोत_संरेखित/hi	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
target_aligned/cs	मूलपाठ		डोरी
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_dutch_nl

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 53.88 MiB
डेटासेट का आकार : 237.97 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ (परीक्षण, सत्यापन), केवल जब shuffle_files=False (ट्रेन)
विभाजन :

विभाजित करना	उदाहरण
`'test'`	6,248
`'train'`	21,866
`'validation'`	3,123

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'nl': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'nl': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/nl	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/nl	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_english_en

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 112.56 MiB
डेटासेट का आकार : 657.51 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	28,614
`'train'`	99,020
`'validation'`	13,823

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_french_fr

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 113.26 MiB
डेटासेट का आकार : 522.28 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	12,731
`'train'`	44,556
`'validation'`	6,364

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'fr': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'fr': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/fr	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/fr	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

रत्न/wiki_lingua_german_de

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड का आकार : 102.65 MiB
डेटासेट का आकार : 452.46 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	11,669
`'train'`	40,839
`'validation'`	5,833

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'de': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'de': Text(shape=(), dtype=string),
        'en': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/डी	मूलपाठ		डोरी
स्रोत_संरेखित/hi	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
target_aligned/de	मूलपाठ		डोरी
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_hindi_hi

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 20.07 MiB
डेटासेट का आकार : 138.06 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	1,984
`'train'`	6,942
`'validation'`	991

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'hi': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'hi': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/हाय	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/हाय	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_indonesian_id

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 80.08 MiB
डेटासेट का आकार : 370.63 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	9,497
`'train'`	33,237
`'validation'`	4,747

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'id': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'id': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/आईडी	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/id	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_italian_it

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 84.80 MiB
डेटासेट का आकार : 374.40 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	10,189
`'train'`	35,661
`'validation'`	5,093

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'it': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'it': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/it	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/it	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_japanese_ja

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 21.75 MiB
डेटासेट का आकार : 103.19 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	2,530
`'train'`	8,853
`'validation'`	1,264

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ja': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ja': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/जा	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/ja	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_korean_ko

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 22.26 MiB
डेटासेट का आकार : 102.35 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	2,436
`'train'`	8,524
`'validation'`	1,216

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ko': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ko': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/ko	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/ko	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_portuguese_pt

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 131.17 MiB
डेटासेट का आकार : 570.46 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	16,331
`'train'`	57,159
`'validation'`	8,165

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'pt': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'pt': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/पीटी	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/pt	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_russian_ru

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 101.36 MiB
डेटासेट का आकार : 564.69 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	10,580
`'train'`	37,028
`'validation'`	5,288

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ru': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'ru': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_गठबंधन/आरयू	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/ru	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_spanish_es

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 189.06 MiB
डेटासेट का आकार : 849.75 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :

विभाजित करना	उदाहरण
`'test'`	22,632
`'train'`	79,212
`'validation'`	11,316

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'es': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'es': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/es	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/es	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

रत्न/wiki_lingua_thai_th

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 28.60 MiB
डेटासेट का आकार : 193.77 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ (परीक्षण, सत्यापन), केवल जब shuffle_files=False (ट्रेन)
विभाजन :

विभाजित करना	उदाहरण
`'test'`	2,950
`'train'`	10,325
`'validation'`	1,475

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'th': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'th': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/वें	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/th	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_turkish_tr

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 6.73 MiB
डेटासेट का आकार : 30.75 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	900
`'train'`	3,148
`'validation'`	449

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'tr': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'tr': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/tr	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/tr	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

मणि/wiki_lingua_vietnamese_vi

कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार : 36.27 MiB
डेटासेट का आकार : 179.77 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :

विभाजित करना	उदाहरण
`'test'`	3,917
`'train'`	13,707
`'validation'`	1,957

फ़ीचर संरचना :

FeaturesDict({
    'gem_id': string,
    'gem_parent_id': string,
    'references': Sequence(string),
    'source': string,
    'source_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'vi': Text(shape=(), dtype=string),
    }),
    'target': string,
    'target_aligned': Translation({
        'en': Text(shape=(), dtype=string),
        'vi': Text(shape=(), dtype=string),
    }),
})

फ़ीचर दस्तावेज़ीकरण :

विशेषता	कक्षा	आकार	डीटाइप
	विशेषताएं डिक्ट
Gem_id	टेन्सर		डोरी
Gem_parent_id	टेन्सर		डोरी
संदर्भ	अनुक्रम (टेंसर)	(कोई भी नहीं,)	डोरी
स्रोत	टेन्सर		डोरी
स्रोत_गठबंधन	अनुवाद
स्रोत_संरेखित/hi	मूलपाठ		डोरी
स्रोत_संरेखित/vi	मूलपाठ		डोरी
लक्ष्य	टेन्सर		डोरी
target_aligned	अनुवाद
लक्ष्य_संरेखित/hi	मूलपाठ		डोरी
target_aligned/vi	मूलपाठ		डोरी

उदाहरण ( tfds.as_dataframe ):

उद्धरण :

@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
  author    = {Sebastian Gehrmann and
               Tosin P. Adewumi and
               Karmanya Aggarwal and
               Pawan Sasanka Ammanamanchi and
               Aremu Anuoluwapo and
               Antoine Bosselut and
               Khyathi Raghavi Chandu and
               Miruna{-}Adriana Clinciu and
               Dipanjan Das and
               Kaustubh D. Dhole and
               Wanyu Du and
               Esin Durmus and
               Ondrej Dusek and
               Chris Emezue and
               Varun Gangal and
               Cristina Garbacea and
               Tatsunori Hashimoto and
               Yufang Hou and
               Yacine Jernite and
               Harsh Jhamtani and
               Yangfeng Ji and
               Shailza Jolly and
               Dhruv Kumar and
               Faisal Ladhak and
               Aman Madaan and
               Mounica Maddela and
               Khyati Mahajan and
               Saad Mahamood and
               Bodhisattwa Prasad Majumder and
               Pedro Henrique Martins and
               Angelina McMillan{-}Major and
               Simon Mille and
               Emiel van Miltenburg and
               Moin Nadeem and
               Shashi Narayan and
               Vitaly Nikolaev and
               Rubungo Andre Niyongabo and
               Salomey Osei and
               Ankur P. Parikh and
               Laura Perez{-}Beltrachini and
               Niranjan Ramesh Rao and
               Vikas Raunak and
               Juan Diego Rodriguez and
               Sashank Santhanam and
               Jo{\~{a} }o Sedoc and
               Thibault Sellam and
               Samira Shaikh and
               Anastasia Shimorina and
               Marco Antonio Sobrevilla Cabezudo and
               Hendrik Strobelt and
               Nishant Subramani and
               Wei Xu and
               Diyi Yang and
               Akhila Yerukola and
               Jiawei Zhou},
  title     = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
               Metrics},
  journal   = {CoRR},
  volume    = {abs/2102.01672},
  year      = {2021},
  url       = {https://arxiv.org/abs/2102.01672},
  archivePrefix = {arXiv},
  eprint    = {2102.01672}
}

Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

रत्न संग्रह की मदद से व्यवस्थित रहें अपनी प्राथमिकताओं के आधार पर, कॉन्टेंट को सेव करें और कैटगरी में बांटें.