race

  • Description:

Race is a large-scale reading comprehension dataset with more than 28,000 passages and nearly 100,000 questions. The dataset is collected from English examinations in China, which are designed for middle school and high school students. The dataset can be served as the training and test sets for machine comprehension.

FeaturesDict({
    'answers': Sequence(Text(shape=(), dtype=string)),
    'article': Text(shape=(), dtype=string),
    'example_id': Text(shape=(), dtype=string),
    'options': Sequence(Sequence(Text(shape=(), dtype=string))),
    'questions': Sequence(Text(shape=(), dtype=string)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
answers Sequence(Text) (None,) string
article Text string
example_id Text string
options Sequence(Sequence(Text)) (None, None) string
questions Sequence(Text) (None,) string
@article{lai2017large,
    title={RACE: Large-scale ReAding Comprehension Dataset From Examinations},
    author={Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard},
    journal={arXiv preprint arXiv:1704.04683},
    year={2017}
}

race/high (default config)

  • Dataset size: 52.39 MiB

  • Splits:

Split Examples
'dev' 1,021
'test' 1,045
'train' 18,728

race/middle

  • Dataset size: 12.51 MiB

  • Splits:

Split Examples
'dev' 368
'test' 362
'train' 6,409