natural_questions

คำอธิบาย :

คลังข้อมูล NQ มีคำถามจากผู้ใช้จริง และจำเป็นต้องมีระบบ QA เพื่ออ่านและทำความเข้าใจบทความ Wikipedia ทั้งหมดที่อาจมีหรือไม่มีคำตอบสำหรับคำถาม การรวมคำถามของผู้ใช้จริงและข้อกำหนดที่โซลูชันควรอ่านทั้งหน้าเพื่อหาคำตอบ ทำให้ NQ เป็นงานที่สมจริงและท้าทายมากกว่าชุดข้อมูล QA ก่อนหน้านี้

เอกสารประกอบเพิ่มเติม : สำรวจเอกสารด้วย
หน้าแรก : https://ai.google.com/research/NaturalQuestions/dataset
ซอร์สโค้ด : tfds.datasets.natural_questions.Builder
รุ่น :
- 0.0.2 : ไม่มีบันทึกประจำรุ่น
- 0.1.0 (ค่าเริ่มต้น): ไม่มีบันทึกประจำรุ่น
ขนาดการดาวน์โหลด : 41.97 GiB
แคชอัตโนมัติ ( เอกสารประกอบ ): ไม่
แยก :

แยก	ตัวอย่าง
`'train'`	307,373
`'validation'`	7,830

คีย์ภายใต้การดูแล (ดู as_supervised doc ): None
รูปภาพ ( tfds.show_examples ): ไม่รองรับ
การอ้างอิง :

@article{47761,
title = {Natural Questions: a Benchmark for Question Answering Research},
author = {Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov},
year = {2019},
journal = {Transactions of the Association of Computational Linguistics}
}

natural_questions/default (การกำหนดค่าเริ่มต้น)

คำอธิบาย การกำหนดค่า : การกำหนดค่าเริ่มต้น natural_questions
ขนาดชุดข้อมูล : 90.26 GiB
โครงสร้างคุณลักษณะ :

FeaturesDict({
    'annotations': Sequence({
        'id': string,
        'long_answer': FeaturesDict({
            'end_byte': int64,
            'end_token': int64,
            'start_byte': int64,
            'start_token': int64,
        }),
        'short_answers': Sequence({
            'end_byte': int64,
            'end_token': int64,
            'start_byte': int64,
            'start_token': int64,
            'text': Text(shape=(), dtype=string),
        }),
        'yes_no_answer': ClassLabel(shape=(), dtype=int64, num_classes=2),
    }),
    'document': FeaturesDict({
        'html': Text(shape=(), dtype=string),
        'title': Text(shape=(), dtype=string),
        'tokens': Sequence({
            'is_html': bool,
            'token': Text(shape=(), dtype=string),
        }),
        'url': Text(shape=(), dtype=string),
    }),
    'id': string,
    'question': FeaturesDict({
        'text': Text(shape=(), dtype=string),
        'tokens': Sequence(string),
    }),
})

เอกสารคุณสมบัติ :

คุณสมบัติ	ระดับ	รูปร่าง	Dประเภท
	คุณสมบัติDict
คำอธิบายประกอบ	ลำดับ
คำอธิบายประกอบ/รหัส	เทนเซอร์		สตริง
คำอธิบายประกอบ/long_answer	คุณสมบัติDict
คำอธิบายประกอบ/long_answer/end_byte	เทนเซอร์		int64
คำอธิบายประกอบ/long_answer/end_token	เทนเซอร์		int64
คำอธิบายประกอบ/long_answer/start_byte	เทนเซอร์		int64
คำอธิบายประกอบ/long_answer/start_token	เทนเซอร์		int64
คำอธิบายประกอบ/short_answers	ลำดับ
คำอธิบายประกอบ/short_answers/end_byte	เทนเซอร์		int64
คำอธิบายประกอบ/short_answers/end_token	เทนเซอร์		int64
คำอธิบายประกอบ/short_answers/start_byte	เทนเซอร์		int64
คำอธิบายประกอบ/short_answers/start_token	เทนเซอร์		int64
คำอธิบายประกอบ/short_answers/ข้อความ	ข้อความ		สตริง
คำอธิบายประกอบ/yes_no_answer	ป้ายกำกับคลาส		int64
เอกสาร	คุณสมบัติDict
เอกสาร/html	ข้อความ		สตริง
เอกสาร/ชื่อเรื่อง	ข้อความ		สตริง
เอกสาร/โทเค็น	ลำดับ
เอกสาร/โทเค็น/is_html	เทนเซอร์		บูล
เอกสาร/โทเค็น/โทเค็น	ข้อความ		สตริง
เอกสาร/url	ข้อความ		สตริง
รหัส	เทนเซอร์		สตริง
คำถาม	คุณสมบัติDict
คำถาม/ข้อความ	ข้อความ		สตริง
คำถาม/โทเค็น	ลำดับ (เทนเซอร์)	(ไม่มี,)	สตริง

ตัวอย่าง ( tfds.as_dataframe ):

natural_questions/longt5

คำอธิบาย การกำหนดค่า : natural_questions ประมวลผลล่วงหน้าตามเกณฑ์มาตรฐาน longT5
ขนาดชุดข้อมูล : 8.91 GiB
โครงสร้างคุณลักษณะ :

FeaturesDict({
    'all_answers': Sequence(Text(shape=(), dtype=string)),
    'answer': Text(shape=(), dtype=string),
    'context': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'question': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

เอกสารคุณสมบัติ :

คุณสมบัติ	ระดับ	รูปร่าง	Dประเภท
	คุณสมบัติDict
all_answers	ลำดับ (ข้อความ)	(ไม่มี,)	สตริง
คำตอบ	ข้อความ		สตริง
บริบท	ข้อความ		สตริง
รหัส	ข้อความ		สตริง
คำถาม	ข้อความ		สตริง
ชื่อ	ข้อความ		สตริง

ตัวอย่าง ( tfds.as_dataframe ):

natural_questions จัดทุกอย่างให้เป็นระเบียบอยู่เสมอด้วยคอลเล็กชัน บันทึกและจัดหมวดหมู่เนื้อหาตามค่ากำหนดของคุณ

natural_questions/default (การกำหนดค่าเริ่มต้น)

natural_questions/longt5

natural_questions