asset

  • Description:

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described in "ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations." The corpus is composed of 2000 validation and 359 test original sentences that were each simplified 10 times by different annotators. The corpus also contains human judgments of meaning preservation, fluency and simplicity for the outputs of several automatic text simplification systems.

@inproceedings{alva-manchego-etal-2020-asset,
    title = "{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations",
    author = "Alva-Manchego, Fernando  and
      Martin, Louis  and
      Bordes, Antoine  and
      Scarton, Carolina  and
      Sagot, Benoit  and
      Specia, Lucia",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.424",
    pages = "4668--4679",
}

asset/simplification (default config)

  • Config description: A set of original sentences aligned with 10 possible simplifications for each.

  • Dataset size: 2.64 MiB

  • Splits:

Split Examples
'test' 359
'validation' 2,000
  • Feature structure:
FeaturesDict({
    'original': Text(shape=(), dtype=string),
    'simplifications': Sequence(Text(shape=(), dtype=string)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
original Text string
simplifications Sequence(Text) (None,) string

asset/ratings

  • Config description: Human ratings of automatically produced text simplification.

  • Dataset size: 1.44 MiB

  • Splits:

Split Examples
'full' 4,500
  • Feature structure:
FeaturesDict({
    'aspect': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'original': Text(shape=(), dtype=string),
    'original_sentence_id': int32,
    'rating': int32,
    'simplification': Text(shape=(), dtype=string),
    'worker_id': int32,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
aspect ClassLabel int64
original Text string
original_sentence_id Tensor int32
rating Tensor int32
simplification Text string
worker_id Tensor int32