mrqa

  • Deskripsi :

Tugas Bersama MRQA 2019 berfokus pada generalisasi dalam menjawab pertanyaan. Sistem penjawab pertanyaan yang efektif harus melakukan lebih dari sekadar menginterpolasi dari set pelatihan untuk menjawab contoh uji yang diambil dari distribusi yang sama: ia juga harus dapat mengekstrapolasi ke contoh di luar distribusi — tantangan yang jauh lebih sulit.

MRQA mengadaptasi dan menyatukan beberapa set data penjawab pertanyaan yang berbeda (subset yang dipilih dengan cermat dari set data yang ada) ke dalam format yang sama (format SQuAD). Diantaranya, enam set data tersedia untuk pelatihan, dan enam set data tersedia untuk pengujian. Sebagian kecil dari kumpulan data pelatihan disimpan sebagai data dalam domain yang dapat digunakan untuk pengembangan. Kumpulan data pengujian hanya berisi data di luar domain. Patokan ini dirilis sebagai bagian dari Tugas Bersama MRQA 2019.

Informasi lebih lanjut dapat ditemukan di: <a href="https://mrqa.github.io/2019/shared.html">https://mrqa.github.io/2019/shared.html</a> .

FeaturesDict({
    'answers': Sequence(string),
    'context': string,
    'context_tokens': Sequence({
        'offsets': int32,
        'tokens': string,
    }),
    'detected_answers': Sequence({
        'char_spans': Sequence({
            'end': int32,
            'start': int32,
        }),
        'text': string,
        'token_spans': Sequence({
            'end': int32,
            'start': int32,
        }),
    }),
    'qid': string,
    'question': string,
    'question_tokens': Sequence({
        'offsets': int32,
        'tokens': string,
    }),
    'subset': string,
})
  • Dokumentasi fitur :
Fitur Kelas Membentuk Dtype Keterangan
fiturDict
jawaban Urutan (Tensor) (Tidak ada,) rangkaian
konteks Tensor rangkaian
konteks_token Urutan
konteks_token/offset Tensor int32
konteks_token/token Tensor rangkaian
terdeteksi_jawaban Urutan
terdeteksi_answers/char_spans Urutan
detect_answers/char_spans/end Tensor int32
detect_answers/char_spans/start Tensor int32
terdeteksi_jawaban/teks Tensor rangkaian
terdeteksi_answers/token_spans Urutan
detect_answers/token_spans/end Tensor int32
detect_answers/token_spans/start Tensor int32
qid Tensor rangkaian
pertanyaan Tensor rangkaian
question_token Urutan
question_token/offset Tensor int32
token_pertanyaan/token Tensor rangkaian
bagian Tensor rangkaian

mrqa/squad (konfigurasi default)

  • Deskripsi konfigurasi : Dataset SQuAD (Stanford Question Answering Dataset) digunakan sebagai dasar untuk format tugas bersama. Crowdworker diperlihatkan paragraf dari Wikipedia dan diminta untuk menulis pertanyaan dengan jawaban ekstraktif.

  • Ukuran unduhan : 29.66 MiB

  • Ukuran dataset : 271.43 MiB

  • Di-cache otomatis ( dokumentasi ): Tidak

  • Perpecahan :

Membelah Contoh
'train' 86.588
'validation' 10.507
  • Kutipan :
@inproceedings{rajpurkar-etal-2016-squad,
    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
    author = "Rajpurkar, Pranav  and
      Zhang, Jian  and
      Lopyrev, Konstantin  and
      Liang, Percy",
    booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2016",
    address = "Austin, Texas",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D16-1264",
    doi = "10.18653/v1/D16-1264",
    pages = "2383--2392",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/news_qa

  • Deskripsi konfigurasi : Dua kelompok crowdworker bertanya dan menjawab pertanyaan berdasarkan artikel berita CNN. "Penanya" hanya melihat judul dan ringkasan artikel sementara "penjawab" melihat artikel lengkap. Pertanyaan yang tidak memiliki jawaban atau ditandai dalam kumpulan data tanpa persetujuan annotator akan dibuang.

  • Ukuran unduhan : 56.83 MiB

  • Ukuran dataset : 654.25 MiB

  • Di-cache otomatis ( dokumentasi ): Tidak

  • Perpecahan :

Membelah Contoh
'train' 74.160
'validation' 4.212
  • Kutipan :
@inproceedings{trischler-etal-2017-newsqa,
        title = "{N}ews{QA}: A Machine Comprehension Dataset",
        author = "Trischler, Adam  and
          Wang, Tong  and
          Yuan, Xingdi  and
          Harris, Justin  and
          Sordoni, Alessandro  and
          Bachman, Philip  and
          Suleman, Kaheer",
        booktitle = "Proceedings of the 2nd Workshop on Representation Learning for {NLP}",
        month = aug,
        year = "2017",
        address = "Vancouver, Canada",
        publisher = "Association for Computational Linguistics",
        url = "https://aclanthology.org/W17-2623",
        doi = "10.18653/v1/W17-2623",
        pages = "191--200",
    }
#
@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/trivia_qa

  • Deskripsi konfigurasi : Pasangan pertanyaan dan jawaban bersumber dari situs trivia dan liga kuis. Versi web TriviaQA, di mana konteks diambil dari hasil permintaan pencarian Bing, digunakan.

  • Ukuran unduhan : 383.14 MiB

  • Ukuran dataset : 772.75 MiB

  • Di-cache otomatis ( dokumentasi ): Tidak

  • Perpecahan :

Membelah Contoh
'train' 61.688
'validation' 7.785
  • Kutipan :
@inproceedings{joshi-etal-2017-triviaqa,
    title = "{T}rivia{QA}: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension",
    author = "Joshi, Mandar  and
      Choi, Eunsol  and
      Weld, Daniel  and
      Zettlemoyer, Luke",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P17-1147",
    doi = "10.18653/v1/P17-1147",
    pages = "1601--1611",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/search_qa

  • Deskripsi konfigurasi : Pasangan pertanyaan dan jawaban bersumber dari Jeopardy! acara TV. Konteks terdiri dari cuplikan yang diambil dari kueri penelusuran Google.

  • Ukuran unduhan : 699.86 MiB

  • Ukuran dataset : 1.38 GiB

  • Di-cache otomatis ( dokumentasi ): Tidak

  • Perpecahan :

Membelah Contoh
'train' 117.384
'validation' 16.980
  • Kutipan :
@article{dunn2017searchqa,
    title={Searchqa: A new q\&a dataset augmented with context from a search engine},
    author={Dunn, Matthew and Sagun, Levent and Higgins, Mike and Guney, V Ugur and Cirik, Volkan and Cho, Kyunghyun},
    journal={arXiv preprint arXiv:1704.05179},
    year={2017}
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/hotpot_qa

  • Deskripsi konfigurasi : Crowdworkers diperlihatkan dua paragraf yang terhubung dengan entitas dari Wikipedia dan diminta untuk menulis dan menjawab pertanyaan yang membutuhkan penalaran multi-hop untuk dipecahkan. Dalam latar aslinya, paragraf ini dicampur dengan paragraf pengalih perhatian tambahan untuk mempersulit penyimpulan. Di sini, paragraf distraktor tidak disertakan.

  • Ukuran unduhan : 111.98 MiB

  • Ukuran dataset : 272.87 MiB

  • Di-cache otomatis ( dokumentasi ): Tidak

  • Perpecahan :

Membelah Contoh
'train' 72.928
'validation' 5.901
  • Kutipan :
@inproceedings{yang-etal-2018-hotpotqa,
    title = "{H}otpot{QA}: A Dataset for Diverse, Explainable Multi-hop Question Answering",
    author = "Yang, Zhilin  and
      Qi, Peng  and
      Zhang, Saizheng  and
      Bengio, Yoshua  and
      Cohen, William  and
      Salakhutdinov, Ruslan  and
      Manning, Christopher D.",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D18-1259",
    doi = "10.18653/v1/D18-1259",
    pages = "2369--2380",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/natural_questions

  • Deskripsi konfigurasi : Pertanyaan dikumpulkan dari kueri pencarian informasi ke mesin pencari Google oleh pengguna nyata dalam kondisi alami. Jawaban atas pertanyaan dijelaskan di halaman Wikipedia yang diambil oleh crowdworker. Dua jenis anotasi dikumpulkan: 1) kotak pembatas HTML yang berisi informasi yang cukup untuk sepenuhnya menyimpulkan jawaban atas pertanyaan (Jawaban Panjang), dan 2) subspan atau sub-rentang dalam kotak pembatas yang terdiri dari jawaban sebenarnya (Jawaban Singkat ). Hanya contoh yang memiliki jawaban singkat yang digunakan, dan jawaban panjang digunakan sebagai konteksnya.

  • Ukuran unduhan : 121.15 MiB

  • Ukuran dataset : 339.03 MiB

  • Di-cache otomatis ( dokumentasi ): Tidak

  • Perpecahan :

Membelah Contoh
'train' 104.071
'validation' 12.836
  • Kutipan :
@article{kwiatkowski-etal-2019-natural,
    title = "Natural Questions: A Benchmark for Question Answering Research",
    author = "Kwiatkowski, Tom  and
      Palomaki, Jennimaria  and
      Redfield, Olivia  and
      Collins, Michael  and
      Parikh, Ankur  and
      Alberti, Chris  and
      Epstein, Danielle  and
      Polosukhin, Illia  and
      Devlin, Jacob  and
      Lee, Kenton  and
      Toutanova, Kristina  and
      Jones, Llion  and
      Kelcey, Matthew  and
      Chang, Ming-Wei  and
      Dai, Andrew M.  and
      Uszkoreit, Jakob  and
      Le, Quoc  and
      Petrov, Slav",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "7",
    year = "2019",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/Q19-1026",
    doi = "10.1162/tacl_a_00276",
    pages = "452--466",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/bio_asq

  • Deskripsi konfigurasi : BioASQ, sebuah tantangan pada pengindeksan semantik biomedis skala besar dan menjawab pertanyaan, berisi pasangan pertanyaan dan jawaban yang dibuat oleh pakar domain. Mereka kemudian ditautkan secara manual ke beberapa artikel sains terkait (PubMed). Abstrak lengkap dari setiap artikel yang ditautkan diunduh dan digunakan sebagai konteks individu (misalnya, satu pertanyaan dapat ditautkan ke beberapa artikel independen untuk membuat beberapa pasangan konteks QA). Abstrak yang tidak tepat berisi jawabannya akan dibuang.

  • Ukuran unduhan : 2.54 MiB

  • Ukuran dataset : 6.70 MiB

  • Di-cache otomatis ( dokumentasi ): Ya

  • Perpecahan :

Membelah Contoh
'test' 1.504
  • Kutipan :
@article{tsatsaronis2015overview,
    title={An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition},
    author={Tsatsaronis, George and Balikas, Georgios and Malakasiotis, Prodromos and Partalas, Ioannis and Zschunke, Matthias and Alvers, Michael R and Weissenborn, Dirk and Krithara, Anastasia and Petridis, Sergios and Polychronopoulos, Dimitris and others},
    journal={BMC bioinformatics},
    volume={16},
    number={1},
    pages={1--28},
    year={2015},
    publisher={Springer}
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/jatuhkan

  • Deskripsi konfigurasi : Contoh DROP (Discrete Reasoning Over the content of Paragraphs) dikumpulkan mirip dengan SQuAD, di mana crowdworker diminta untuk membuat pasangan pertanyaan-jawaban dari paragraf Wikipedia. Pertanyaan berfokus pada penalaran kuantitatif, dan kumpulan data asli berisi jawaban numerik non-ekstraktif serta jawaban teks ekstraktif. Himpunan soal yang digunakan bersifat ekstraktif.

  • Ukuran unduhan : 578.25 KiB

  • Ukuran dataset : 5.41 MiB

  • Di-cache otomatis ( dokumentasi ): Ya

  • Perpecahan :

Membelah Contoh
'test' 1.503
  • Kutipan :
@inproceedings{dua-etal-2019-drop,
    title = "{DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs",
    author = "Dua, Dheeru  and
      Wang, Yizhong  and
      Dasigi, Pradeep  and
      Stanovsky, Gabriel  and
      Singh, Sameer  and
      Gardner, Matt",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1246",
    doi = "10.18653/v1/N19-1246",
    pages = "2368--2378",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/duo_rc

  • Deskripsi konfigurasi : Digunakan pemisahan ParaphraseRC dari dataset DuoRC. Dalam latar ini, dua ringkasan plot berbeda dari film yang sama dikumpulkan—satu dari Wikipedia dan lainnya dari IMDb. Dua kelompok crowdworker yang berbeda bertanya dan menjawab pertanyaan tentang plot film, di mana "penanya" hanya ditampilkan di halaman Wikipedia, dan "penjawab" hanya ditampilkan di halaman IMDb. Pertanyaan yang ditandai sebagai tidak dapat dijawab akan dibuang.

  • Ukuran unduhan : 1.14 MiB

  • Ukuran dataset : 15.04 MiB

  • Di-cache otomatis ( dokumentasi ): Ya

  • Perpecahan :

Membelah Contoh
'test' 1.501
  • Kutipan :
@inproceedings{saha-etal-2018-duorc,
    title = "{D}uo{RC}: Towards Complex Language Understanding with Paraphrased Reading Comprehension",
    author = "Saha, Amrita  and
      Aralikatte, Rahul  and
      Khapra, Mitesh M.  and
      Sankaranarayanan, Karthik",
    booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2018",
    address = "Melbourne, Australia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P18-1156",
    doi = "10.18653/v1/P18-1156",
    pages = "1683--1693",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/ras

  • Deskripsi konfigurasi : Kumpulan Data Pemahaman Membaca Dari Ujian (RACE) dikumpulkan dari ujian pemahaman bacaan bahasa Inggris untuk siswa sekolah menengah dan atas Cina. Perpecahan sekolah menengah (yang lebih menantang) digunakan dan juga pertanyaan gaya "isi yang kosong" implisit (yang tidak wajar untuk tugas ini) disaring.

  • Ukuran unduhan : 1.49 MiB

  • Ukuran dataset : 3.53 MiB

  • Di-cache otomatis ( dokumentasi ): Ya

  • Perpecahan :

Membelah Contoh
'test' 674
  • Kutipan :
@inproceedings{lai-etal-2017-race,
    title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
    author = "Lai, Guokun  and
      Xie, Qizhe  and
      Liu, Hanxiao  and
      Yang, Yiming  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
    month = sep,
    year = "2017",
    address = "Copenhagen, Denmark",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D17-1082",
    doi = "10.18653/v1/D17-1082",
    pages = "785--794",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/relation_extraction

  • Deskripsi konfigurasi : Diberikan kumpulan data yang mengisi slot, hubungan antar entitas diubah secara sistematis menjadi pasangan pertanyaan-jawaban menggunakan templat. Misalnya, hubungan terdidik_at(x, y) antara dua entitas x dan y yang muncul dalam sebuah kalimat dapat dinyatakan sebagai “Di mana x berpendidikan?” dengan jawaban y. Beberapa template untuk setiap jenis relasi dikumpulkan. Pemisahan tolok ukur zeroshot dataset (generalisasi ke hubungan yang tidak terlihat) digunakan, dan hanya contoh positif yang disimpan.

  • Ukuran unduhan : 830.88 KiB

  • Ukuran dataset : 3.71 MiB

  • Di-cache otomatis ( dokumentasi ): Ya

  • Perpecahan :

Membelah Contoh
'test' 2.948
  • Kutipan :
@inproceedings{levy-etal-2017-zero,
    title = "Zero-Shot Relation Extraction via Reading Comprehension",
    author = "Levy, Omer  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Zettlemoyer, Luke",
    booktitle = "Proceedings of the 21st Conference on Computational Natural Language Learning ({C}o{NLL} 2017)",
    month = aug,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/K17-1034",
    doi = "10.18653/v1/K17-1034",
    pages = "333--342",
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."

mrqa/textbook_qa

  • Deskripsi Config : TextbookQA dikumpulkan dari pelajaran dari buku teks Life Science, Earth Science, dan Physical Science sekolah menengah. Pertanyaan yang disertai diagram, atau pertanyaan “Benar atau Salah” tidak disertakan.

  • Ukuran unduhan : 1.79 MiB

  • Ukuran dataset : 14.04 MiB

  • Di-cache otomatis ( dokumentasi ): Ya

  • Perpecahan :

Membelah Contoh
'test' 1.503
  • Kutipan :
@inproceedings{kembhavi2017you,
    title={Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension},
    author={Kembhavi, Aniruddha and Seo, Minjoon and Schwenk, Dustin and Choi, Jonghyun and Farhadi, Ali and Hajishirzi, Hannaneh},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern recognition},
    pages={4999--5007},
    year={2017}
}

@inproceedings{fisch-etal-2019-mrqa,
    title = "{MRQA} 2019 Shared Task: Evaluating Generalization in Reading Comprehension",
    author = "Fisch, Adam  and
      Talmor, Alon  and
      Jia, Robin  and
      Seo, Minjoon  and
      Choi, Eunsol  and
      Chen, Danqi",
    booktitle = "Proceedings of the 2nd Workshop on Machine Reading for Question Answering",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5801",
    doi = "10.18653/v1/D19-5801",
    pages = "1--13",
}

Note that each MRQA dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."