Help protect the Great Barrier Reef with TensorFlow on Kaggle Join Challenge

asset

  • Description:

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described in "ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations." The corpus is composed of 2000 validation and 359 test original sentences that were each simplified 10 times by different annotators. The corpus also contains human judgments of meaning preservation, fluency and simplicity for the outputs of several automatic text simplification systems.

@inproceedings{alva-manchego-etal-2020-asset,
    title = "{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations",
    author = "Alva-Manchego, Fernando  and
      Martin, Louis  and
      Bordes, Antoine  and
      Scarton, Carolina  and
      Sagot, Benoit  and
      Specia, Lucia",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.424",
    pages = "4668--4679",
}

asset/simplification (default config)

  • Config description: A set of original sentences aligned with 10 possible simplifications for each.

  • Dataset size: 2.64 MiB

  • Splits:

Split Examples
'test' 359
'validation' 2,000
  • Features:
FeaturesDict({
    'original': Text(shape=(), dtype=tf.string),
    'simplifications': Sequence(Text(shape=(), dtype=tf.string)),
})

asset/ratings

  • Config description: Human ratings of automatically produced text simplification.

  • Dataset size: 1.44 MiB

  • Splits:

Split Examples
'full' 4,500
  • Features:
FeaturesDict({
    'aspect': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'original': Text(shape=(), dtype=tf.string),
    'original_sentence_id': tf.int32,
    'rating': tf.int32,
    'simplification': Text(shape=(), dtype=tf.string),
    'worker_id': tf.int32,
})