xtreme_pawsx

  • Description:

This dataset contains machine translations of the English PAWS training data. The translations are provided by the XTREME benchmark and cover the following languages:

  • French
  • Spanish
  • German
  • Chinese
  • Japanese
  • Korean

For further details on PAWS, see the papers: PAWS: Paraphrase Adversaries from Word Scrambling at https://arxiv.org/abs/1904.01130 and PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification at https://arxiv.org/abs/1908.11828

For details related to XTREME, please refer to: XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization at https://arxiv.org/abs/2003.11080

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string),
    'sentence2': Text(shape=(), dtype=tf.string),
})
@article{hu2020xtreme,
      author    = {Junjie Hu and Sebastian Ruder and Aditya Siddhant and Graham Neubig and Orhan Firat and Melvin Johnson},
      title     = {XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization},
      journal   = {CoRR},
      volume    = {abs/2003.11080},
      year      = {2020},
      archivePrefix = {arXiv},
      eprint    = {2003.11080}
}

xtreme_pawsx/de (default config)

  • Config description: Translated to de

  • Download size: 22.34 MiB

  • Dataset size: 14.19 MiB

  • Splits:

Split Examples
'train' 49,340

xtreme_pawsx/es

  • Config description: Translated to es

  • Download size: 22.27 MiB

  • Dataset size: 14.09 MiB

  • Splits:

Split Examples
'train' 49,244

xtreme_pawsx/fr

  • Config description: Translated to fr

  • Download size: 22.70 MiB

  • Dataset size: 14.53 MiB

  • Splits:

Split Examples
'train' 49,208

xtreme_pawsx/ja

  • Config description: Translated to ja

  • Download size: 25.12 MiB

  • Dataset size: 16.98 MiB

  • Splits:

Split Examples
'train' 49,086

xtreme_pawsx/ko

  • Config description: Translated to ko

  • Download size: 22.99 MiB

  • Dataset size: 14.86 MiB

  • Splits:

Split Examples
'train' 49,298

xtreme_pawsx/zh

  • Config description: Translated to zh

  • Download size: 21.45 MiB

  • Dataset size: 13.21 MiB

  • Splits:

Split Examples
'train' 49,149