วิทยาศาสตร์_papers

คำอธิบาย :

ชุดข้อมูลเอกสารทางวิทยาศาสตร์ประกอบด้วยเอกสารที่มีความยาวและมีโครงสร้างสองชุด ชุดข้อมูลได้มาจากที่เก็บ ArXiv และ PubMed OpenAccess

ทั้ง "arxiv" และ "pubmed" มีสองคุณสมบัติ:

บทความ: เนื้อหาของเอกสาร ย่อหน้าคั่นด้วย "/n"
บทคัดย่อ: บทคัดย่อของเอกสาร ย่อหน้าคั่นด้วย "/n"
section_names: ชื่อหัวข้อ คั่นด้วย "/n"
เอกสารประกอบเพิ่มเติม : สำรวจเอกสารด้วยรหัส
หน้าแรก : https://github.com/armancohan/long-summarization
รหัสแหล่งที่มา : tfds.datasets.scientific_papers.Builder
รุ่น :
- 1.1.0 : ไม่มีบันทึกประจำรุ่น
- 1.1.1 (ค่าเริ่มต้น): ไม่มีบันทึกประจำรุ่น
ขนาดการดาวน์โหลด : 4.20 GiB
แคชอัตโนมัติ ( เอกสารประกอบ ): ไม่
โครงสร้างคุณลักษณะ :

FeaturesDict({
    'abstract': Text(shape=(), dtype=string),
    'article': Text(shape=(), dtype=string),
    'section_names': Text(shape=(), dtype=string),
})

เอกสารคุณสมบัติ :

คุณสมบัติ	ระดับ	Dประเภท
	คุณสมบัติDict
นามธรรม	ข้อความ	สตริง
บทความ	ข้อความ	สตริง
section_names	ข้อความ	สตริง

คีย์ภายใต้การดูแล (ดู as_supervised doc ): ('article', 'abstract')
รูปภาพ ( tfds.show_examples ): ไม่รองรับ
การอ้างอิง :

@article{Cohan_2018,
   title={A Discourse-Aware Attention Model for Abstractive Summarization of
            Long Documents},
   url={http://dx.doi.org/10.18653/v1/n18-2097},
   DOI={10.18653/v1/n18-2097},
   journal={Proceedings of the 2018 Conference of the North American Chapter of
          the Association for Computational Linguistics: Human Language
          Technologies, Volume 2 (Short Papers)},
   publisher={Association for Computational Linguistics},
   author={Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli},
   year={2018}
}

science_papers/arxiv (การกำหนดค่าเริ่มต้น)

คำอธิบาย การกำหนดค่า : เอกสารจากที่เก็บ ArXiv
ขนาดชุดข้อมูล : 7.07 GiB
แยก :

แยก	ตัวอย่าง
`'test'`	6,440
`'train'`	203,037
`'validation'`	6,436

ตัวอย่าง ( tfds.as_dataframe ):

science_papers/pubmed

คำอธิบาย การกำหนดค่า : เอกสารจากที่เก็บ PubMed
ขนาดชุดข้อมูล : 2.34 GiB
แยก :

แยก	ตัวอย่าง
`'test'`	6,658
`'train'`	119,924
`'validation'`	6,633

ตัวอย่าง ( tfds.as_dataframe ):