qasc

  • Description:

QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.

Split Examples
'test' 920
'train' 8,134
'validation' 926
  • Feature structure:
FeaturesDict({
    'answerKey': Text(shape=(), dtype=string),
    'choices': Sequence({
        'label': Text(shape=(), dtype=string),
        'text': Text(shape=(), dtype=string),
    }),
    'combinedfact': Text(shape=(), dtype=string),
    'fact1': Text(shape=(), dtype=string),
    'fact2': Text(shape=(), dtype=string),
    'formatted_question': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'question': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
answerKey Text string
choices Sequence
choices/label Text string
choices/text Text string
combinedfact Text string
fact1 Text string
fact2 Text string
formatted_question Text string
id Text string
question Text string
  • Citation:
@article{allenai:qasc,
      author    = {Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal},
      title     = {QASC: A Dataset for Question Answering via Sentence Composition},
      journal   = {arXiv:1910.11473v2},
      year      = {2020},
}