asset

  • Description:

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described in "ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations." The corpus is composed of 2000 validation and 359 test original sentences that were each simplified 10 times by different annotators. The corpus also contains human judgments of meaning preservation, fluency and simplicity for the outputs of several automatic text simplification systems.

@inproceedings{alva-manchego-etal-2020-asset,
    title = "{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations",
    author = "Alva-Manchego, Fernando  and
      Martin, Louis  and
      Bordes, Antoine  and
      Scarton, Carolina  and
      Sagot, Benoit  and
      Specia, Lucia",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.424",
    pages = "4668--4679",
}

asset/simplification (default config)

  • Config description: A set of original sentences aligned with 10 possible simplifications for each.

  • Dataset size: 2.64 MiB

  • Splits:

Split Examples
'test' 359
'validation' 2,000
  • Feature structure:
FeaturesDict({
    'original': Text(shape=(), dtype=tf.string),
    'simplifications': Sequence(Text(shape=(), dtype=tf.string)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
original Text tf.string
simplifications Sequence(Text) (None,) tf.string

asset/ratings

  • Config description: Human ratings of automatically produced text simplification.

  • Dataset size: 1.44 MiB

  • Splits:

Split Examples
'full' 4,500
  • Feature structure:
FeaturesDict({
    'aspect': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'original': Text(shape=(), dtype=tf.string),
    'original_sentence_id': tf.int32,
    'rating': tf.int32,
    'simplification': Text(shape=(), dtype=tf.string),
    'worker_id': tf.int32,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
aspect ClassLabel tf.int64
original Text tf.string
original_sentence_id Tensor tf.int32
rating Tensor tf.int32
simplification Text tf.string
worker_id Tensor tf.int32