TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

qasc

Description:

QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.

Additional Documentation: Explore on Papers With Code
Homepage: https://allenai.org/data/qasc
Source code: tfds.datasets.qasc.Builder
Versions:
- 0.1.0 (default): No release notes.
Download size: 1.54 MiB
Dataset size: 6.61 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	920
`'train'`	8,134
`'validation'`	926

Feature structure:

FeaturesDict({
    'answerKey': Text(shape=(), dtype=string),
    'choices': Sequence({
        'label': Text(shape=(), dtype=string),
        'text': Text(shape=(), dtype=string),
    }),
    'combinedfact': Text(shape=(), dtype=string),
    'fact1': Text(shape=(), dtype=string),
    'fact2': Text(shape=(), dtype=string),
    'formatted_question': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'question': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
answerKey	Text	string
choices	Sequence
choices/label	Text	string
choices/text	Text	string
combinedfact	Text	string
fact1	Text	string
fact2	Text	string
formatted_question	Text	string
id	Text	string
question	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{allenai:qasc,
      author    = {Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal},
      title     = {QASC: A Dataset for Question Answering via Sentence Composition},
      journal   = {arXiv:1910.11473v2},
      year      = {2020},
}