xquad

  • Description:

XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages. To run XQuAD in the default zero-shot setting, use the SQuAD v1.1 training and validation data here: https://www.tensorflow.org/datasets/catalog/squad

We also include "translate-train", "translate-dev", and "translate-test" splits for each non-English language from XTREME (Hu et al., 2020). These can be used to run XQuAD in the "translate-train" or "translate-test" settings.

FeaturesDict({
    'answers': Sequence({
        'answer_start': tf.int32,
        'text': Text(shape=(), dtype=tf.string),
    }),
    'context': Text(shape=(), dtype=tf.string),
    'id': tf.string,
    'question': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
})
@article{Artetxe:etal:2019,
      author    = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
      title     = {On the cross-lingual transferability of monolingual representations},
      journal   = {CoRR},
      volume    = {abs/1910.11856},
      year      = {2019},
      archivePrefix = {arXiv},
      eprint    = {1910.11856}
}

xquad/ar (default config)

  • Config description: XQuAD 'ar' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 420.97 MiB

  • Dataset size: 134.78 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,541
'translate-test' 1,151
'translate-train' 86,787

xquad/de

  • Config description: XQuAD 'de' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 127.04 MiB

  • Dataset size: 98.75 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,371
'translate-test' 1,168
'translate-train' 82,603

xquad/el

  • Config description: XQuAD 'el' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 499.40 MiB

  • Dataset size: 157.85 MiB

  • Auto-cached (documentation): Yes (test, translate-dev, translate-test), Only when shuffle_files=False (translate-train)

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,100
'translate-test' 1,182
'translate-train' 79,946

xquad/es

  • Config description: XQuAD 'es' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 138.41 MiB

  • Dataset size: 104.91 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,566
'translate-test' 1,188
'translate-train' 87,488

xquad/hi

  • Config description: XQuAD 'hi' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 472.23 MiB

  • Dataset size: 207.80 MiB

  • Auto-cached (documentation): Yes (test, translate-dev, translate-test), Only when shuffle_files=False (translate-train)

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,536
'translate-test' 1,184
'translate-train' 85,804

xquad/ru

  • Config description: XQuAD 'ru' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 513.80 MiB

  • Dataset size: 159.33 MiB

  • Auto-cached (documentation): Yes (test, translate-dev, translate-test), Only when shuffle_files=False (translate-train)

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,469
'translate-test' 1,190
'translate-train' 84,869

xquad/th

  • Config description: XQuAD 'th' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 461.54 MiB

  • Dataset size: 199.52 MiB

  • Auto-cached (documentation): Yes (test, translate-dev, translate-test), Only when shuffle_files=False (translate-train)

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,516
'translate-test' 1,157
'translate-train' 85,846

xquad/tr

  • Config description: XQuAD 'tr' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 151.08 MiB

  • Dataset size: 97.51 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,535
'translate-test' 1,112
'translate-train' 86,511

xquad/vi

  • Config description: XQuAD 'vi' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 218.09 MiB

  • Dataset size: 119.98 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,555
'translate-test' 1,178
'translate-train' 87,187

xquad/zh

  • Config description: XQuAD 'zh' test split, with machine-translated translate-train/translate-dev/translate-test splits from XTREME (Hu et al., 2020).

  • Download size: 174.57 MiB

  • Dataset size: 80.74 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190
'translate-dev' 10,475
'translate-test' 1,186
'translate-train' 85,700

xquad/en

  • Config description: XQuAD 'en' test split.

  • Download size: 595.10 KiB

  • Dataset size: 1.19 MiB

  • Auto-cached (documentation): Yes

  • Splits:

Split Examples
'test' 1,190