lem

  • Keterangan:

LEM, General Bahasa Memahami patokan Evaluasi ( https://gluebenchmark.com/ ) adalah kumpulan sumber daya untuk pelatihan, evaluasi, dan analisis bahasa alami sistem memahami.

lem/cola (konfigurasi default)

  • Config deskripsi: The Corpus Linguistik Penerimaan terdiri dari bahasa Inggris penerimaan penilaian diambil dari buku-buku dan artikel jurnal pada teori linguistik. Setiap contoh adalah urutan kata yang dianotasi dengan apakah itu kalimat bahasa Inggris gramatikal.

  • Homepage: https://nyu-mll.github.io/CoLA/

  • Ukuran download: 368.14 KiB

  • Ukuran dataset: 965.49 KiB

  • Splits:

Membelah Contoh
'test' 1.063
'train' 8.551
'validation' 1.043
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence': Text(shape=(), dtype=tf.string),
})
  • Citation:
@article{warstadt2018neural,
  title={Neural Network Acceptability Judgments},
  author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R},
  journal={arXiv preprint arXiv:1805.12471},
  year={2018}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/sst2

  • Config deskripsi: The Sentimen Stanford Treebank terdiri dari kalimat dari review film dan anotasi manusia sentimen mereka. Tugasnya adalah memprediksi sentimen dari kalimat yang diberikan. Kami menggunakan pemisahan kelas dua arah (positif/negatif), dan hanya menggunakan label tingkat kalimat.

  • Homepage: https://nlp.stanford.edu/sentiment/index.html

  • Ukuran download: 7.09 MiB

  • Ukuran dataset: 7.22 MiB

  • Splits:

Membelah Contoh
'test' 1,821
'train' 67.349
'validation' 872
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence': Text(shape=(), dtype=tf.string),
})
  • Citation:
@inproceedings{socher2013recursive,
  title={Recursive deep models for semantic compositionality over a sentiment treebank},
  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew and Potts, Christopher},
  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
  pages={1631--1642},
  year={2013}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/mrpc

  • Config deskripsi: The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) adalah corpus pasangan kalimat secara otomatis diambil dari sumber berita online, dengan penjelasan manusia apakah kalimat dalam pasangan semantik setara.

  • Homepage: https://www.microsoft.com/en-us/download/details.aspx?id=52398

  • Ukuran download: 1.43 MiB

  • Ukuran dataset: 1.74 MiB

  • Splits:

Membelah Contoh
'test' 1.725
'train' 3,668
'validation' 408
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string),
    'sentence2': Text(shape=(), dtype=tf.string),
})
  • Citation:
@inproceedings{dolan2005automatically,
  title={Automatically constructing a corpus of sentential paraphrases},
  author={Dolan, William B and Brockett, Chris},
  booktitle={Proceedings of the Third International Workshop on Paraphrasing (IWP2005)},
  year={2005}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/qqp

  • Config deskripsi: The Quora Pertanyaan Pairs2 dataset adalah kumpulan dari pasangan pertanyaan dari masyarakat menjawab pertanyaan-situs Quora. Tugasnya adalah untuk menentukan apakah sepasang pertanyaan secara semantik setara.

  • Homepage: https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs

  • Ukuran download: 39.76 MiB

  • Ukuran dataset: 150.37 MiB

  • Splits:

Membelah Contoh
'test' 390.965
'train' 363.846
'validation' 40.430
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'question1': Text(shape=(), dtype=tf.string),
    'question2': Text(shape=(), dtype=tf.string),
})
  • Citation:
@online{WinNT,
  author = {Iyer, Shankar and Dandekar, Nikhil and Csernai, Kornel},
  title = {First Quora Dataset Release: Question Pairs},
  year = 2017,
  url = {https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs},
  urldate = {2019-04-03}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/stsb

  • Config deskripsi: The Semantic Tekstual Similarity benchmark (. Cer et al, 2017) adalah kumpulan kalimat pasang diambil dari berita, video dan gambar keterangan, dan data inferensi bahasa alami. Setiap pasangan diberi anotasi manusia dengan skor kesamaan dari 0 hingga 5.

  • Homepage: http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark

  • Ukuran download: 784.05 KiB

  • Ukuran dataset: 1.58 MiB

  • Splits:

Membelah Contoh
'test' 1,379
'train' 5.749
'validation' 1.500
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': tf.float32,
    'sentence1': Text(shape=(), dtype=tf.string),
    'sentence2': Text(shape=(), dtype=tf.string),
})
  • Citation:
@article{cer2017semeval,
  title={Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation},
  author={Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, Inigo and Specia, Lucia},
  journal={arXiv preprint arXiv:1708.00055},
  year={2017}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/mnli

  • Config deskripsi: The Multi-Genre Natural Language Inference Corpus adalah kumpulan crowdsourced pasangan kalimat dengan penjelasan entailment tekstual. Mengingat kalimat premis dan kalimat hipotesis, tugasnya adalah untuk memprediksi apakah premis memerlukan hipotesis (entailment), bertentangan dengan hipotesis (kontradiksi), atau tidak (netral). Kalimat premis dikumpulkan dari sepuluh sumber yang berbeda, termasuk pidato yang ditranskripsi, fiksi, dan laporan pemerintah. Kami menggunakan set pengujian standar, yang untuknya kami memperoleh label pribadi dari penulis, dan mengevaluasi bagian yang cocok (dalam domain) dan tidak cocok (lintas domain). Kami juga menggunakan dan merekomendasikan corpus SNLI sebagai 550k contoh data pelatihan tambahan.

  • Homepage: http://www.nyu.edu/projects/bowman/multinli/

  • Ukuran download: 298.29 MiB

  • Ukuran dataset: 100.56 MiB

  • Splits:

Membelah Contoh
'test_matched' 9.796
'test_mismatched' 9.847
'train' 392,702
'validation_matched' 9.815
'validation_mismatched' 9.832
  • fitur:
FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string),
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Text(shape=(), dtype=tf.string),
})
  • Citation:
@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}
@article{bowman2015large,
  title={A large annotated corpus for learning natural language inference},
  author={Bowman, Samuel R and Angeli, Gabor and Potts, Christopher and Manning, Christopher D},
  journal={arXiv preprint arXiv:1508.05326},
  year={2015}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/mnli_mismatched

  • Config deskripsi: The serasi validasi dan uji perpecahan dari MNLI. Lihat BuilderConfig "mnli" untuk informasi tambahan.

  • Homepage: http://www.nyu.edu/projects/bowman/multinli/

  • Ukuran download: 298.29 MiB

  • Ukuran dataset: 4.79 MiB

  • Splits:

Membelah Contoh
'test' 9.847
'validation' 9.832
  • fitur:
FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string),
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Text(shape=(), dtype=tf.string),
})
  • Citation:
@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}
@article{bowman2015large,
  title={A large annotated corpus for learning natural language inference},
  author={Bowman, Samuel R and Angeli, Gabor and Potts, Christopher and Manning, Christopher D},
  journal={arXiv preprint arXiv:1508.05326},
  year={2015}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/mnli_cocok

  • Config deskripsi: validasi cocok dan uji perpecahan dari MNLI. Lihat BuilderConfig "mnli" untuk informasi tambahan.

  • Homepage: http://www.nyu.edu/projects/bowman/multinli/

  • Ukuran download: 298.29 MiB

  • Ukuran dataset: 4.58 MiB

  • Splits:

Membelah Contoh
'test' 9.796
'validation' 9.815
  • fitur:
FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string),
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Text(shape=(), dtype=tf.string),
})
  • Citation:
@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}
@article{bowman2015large,
  title={A large annotated corpus for learning natural language inference},
  author={Bowman, Samuel R and Angeli, Gabor and Potts, Christopher and Manning, Christopher D},
  journal={arXiv preprint arXiv:1508.05326},
  year={2015}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/qnli

  • Config deskripsi: The Stanford Pertanyaan Menjawab Dataset adalah dataset pertanyaan-menjawab terdiri dari pasangan pertanyaan-ayat, di mana salah satu kalimat dalam ayat (diambil dari Wikipedia) berisi jawaban yang sesuai pertanyaan (yang ditulis oleh seorang Annotator). Kami mengubah tugas menjadi klasifikasi pasangan kalimat dengan membentuk pasangan antara setiap pertanyaan dan setiap kalimat dalam konteks yang sesuai, dan menyaring pasangan dengan tumpang tindih leksikal rendah antara pertanyaan dan kalimat konteks. Tugasnya adalah menentukan apakah kalimat konteks mengandung jawaban atas pertanyaan tersebut. Versi modifikasi dari tugas asli ini menghilangkan persyaratan bahwa model memilih jawaban yang tepat, tetapi juga menghilangkan asumsi penyederhanaan bahwa jawaban selalu ada dalam input dan bahwa tumpang tindih leksikal adalah petunjuk yang dapat diandalkan.

  • Homepage: https://rajpurkar.github.io/SQuAD-explorer/

  • Ukuran download: 10.14 MiB

  • Ukuran dataset: 32.99 MiB

  • Splits:

Membelah Contoh
'test' 5.463
'train' 104.743
'validation' 5.463
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'question': Text(shape=(), dtype=tf.string),
    'sentence': Text(shape=(), dtype=tf.string),
})
  • Citation:
@article{rajpurkar2016squad,
  title={Squad: 100,000+ questions for machine comprehension of text},
  author={Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy},
  journal={arXiv preprint arXiv:1606.05250},
  year={2016}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem/rte

  • Config deskripsi: The Menyadari Tekstual Entailment (RTE) dataset berasal dari serangkaian tantangan entailment tekstual tahunan. Kami menggabungkan data dari RTE1 (Dagan et al., 2006), RTE2 (Bar Haim et al., 2006), RTE3 (Giampiccolo et al., 2007), dan RTE5 (Bentivogli et al., 2009).4 Contohnya adalah dibangun berdasarkan berita dan teks Wikipedia. Kami mengonversi semua kumpulan data menjadi pemisahan dua kelas, di mana untuk kumpulan data tiga kelas kami menciutkan netral dan kontradiksi menjadi bukan entailment, untuk konsistensi.

  • Homepage: https://aclweb.org/aclwiki/Recognizing_Textual_Entailment

  • Ukuran download: 680.81 KiB

  • Ukuran dataset: 2.15 MiB

  • Splits:

Membelah Contoh
'test' 3.000
'train' 2.490
'validation' 277
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string),
    'sentence2': Text(shape=(), dtype=tf.string),
})
  • Citation:
@inproceedings{dagan2005pascal,
  title={The PASCAL recognising textual entailment challenge},
  author={Dagan, Ido and Glickman, Oren and Magnini, Bernardo},
  booktitle={Machine Learning Challenges Workshop},
  pages={177--190},
  year={2005},
  organization={Springer}
}
@inproceedings{bar2006second,
  title={The second pascal recognising textual entailment challenge},
  author={Bar-Haim, Roy and Dagan, Ido and Dolan, Bill and Ferro, Lisa and Giampiccolo, Danilo and Magnini, Bernardo and Szpektor, Idan},
  booktitle={Proceedings of the second PASCAL challenges workshop on recognising textual entailment},
  volume={6},
  number={1},
  pages={6--4},
  year={2006},
  organization={Venice}
}
@inproceedings{giampiccolo2007third,
  title={The third pascal recognizing textual entailment challenge},
  author={Giampiccolo, Danilo and Magnini, Bernardo and Dagan, Ido and Dolan, Bill},
  booktitle={Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing},
  pages={1--9},
  year={2007},
  organization={Association for Computational Linguistics}
}
@inproceedings{bentivogli2009fifth,
  title={The Fifth PASCAL Recognizing Textual Entailment Challenge.},
  author={Bentivogli, Luisa and Clark, Peter and Dagan, Ido and Giampiccolo, Danilo},
  booktitle={TAC},
  year={2009}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem / wnli

  • Config deskripsi: The Winograd Skema Challenge (. Levesque et al, 2011) adalah tugas membaca pemahaman di mana sistem harus membaca kalimat dengan kata ganti dan pilih referen kata ganti yang dari daftar pilihan. Contoh-contoh tersebut dibuat secara manual untuk menggagalkan metode statistik sederhana: Masing-masing bergantung pada informasi kontekstual yang disediakan oleh satu kata atau frasa dalam kalimat. Untuk mengubah masalah menjadi klasifikasi pasangan kalimat, kita membangun pasangan kalimat dengan mengganti kata ganti ambigu dengan setiap referensi yang mungkin. Tugasnya adalah untuk memprediksi apakah kalimat dengan kata ganti yang diganti itu disyaratkan oleh kalimat aslinya. Kami menggunakan perangkat evaluasi kecil yang terdiri dari contoh-contoh baru yang berasal dari buku-buku fiksi yang dibagikan secara pribadi oleh penulis korpus asli. Sementara set pelatihan yang disertakan seimbang antara dua kelas, set tes tidak seimbang di antara keduanya (65% bukan entailment). Juga, karena kekhasan data, set pengembangan bersifat permusuhan: hipotesis terkadang dibagi antara contoh pelatihan dan pengembangan, jadi jika model menghafal contoh pelatihan, mereka akan memprediksi label yang salah pada contoh set pengembangan yang sesuai. Seperti halnya QNLI, setiap contoh dievaluasi secara terpisah, jadi tidak ada korespondensi sistematis antara skor model pada tugas ini dan skornya pada tugas asli yang belum dikonversi. Kami menyebut kumpulan data yang dikonversi WNLI (Winograd NLI).

  • Homepage: https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html

  • Ukuran download: 28.32 KiB

  • Ukuran dataset: 198.88 KiB

  • Splits:

Membelah Contoh
'test' 146
'train' 635
'validation' 71
  • fitur:
FeaturesDict({
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string),
    'sentence2': Text(shape=(), dtype=tf.string),
})
  • Citation:
@inproceedings{levesque2012winograd,
  title={The winograd schema challenge},
  author={Levesque, Hector and Davis, Ernest and Morgenstern, Leora},
  booktitle={Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning},
  year={2012}
}
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

lem / kapak

  • Config deskripsi: Sebuah manual-dikuratori dataset evaluasi untuk analisis fine-grained kinerja sistem pada berbagai fenomena linguistik. Dataset ini mengevaluasi pemahaman kalimat melalui masalah Natural Language Inference (NLI). Gunakan model yang dilatih pada MulitNLI untuk menghasilkan prediksi untuk kumpulan data ini.

  • Homepage: https://gluebenchmark.com/diagnostics

  • Ukuran download: 217.05 KiB

  • Ukuran dataset: 299.16 KiB

  • Splits:

Membelah Contoh
'test' 1.104
  • fitur:
FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string),
    'idx': tf.int32,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Text(shape=(), dtype=tf.string),
})
  • Citation:
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.