samsum

  • Description:

SAMSum Corpus contains over 16k chat dialogues with manually annotated summaries.

There are two features:

  • dialogue: text of dialogue.
  • summary: human written summary of the dialogue.
  • id: id of a example.

  • Homepage: https://arxiv.org/src/1911.12237v2/anc

  • Source code: tfds.summarization.Samsum

  • Versions:

    • 1.0.0 (default): No release notes.
  • Download size: Unknown size

  • Dataset size: Unknown size

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/download/manual/):
    Download https://arxiv.org/src/1911.12237v2/anc/corpus.7z, decompress and place train.json, val.json and test.json in the manual follder.

  • Auto-cached (documentation): Unknown

  • Splits:

Split Examples
  • Features:
FeaturesDict({
    'dialogue': Text(shape=(), dtype=tf.string),
    'id': Text(shape=(), dtype=tf.string),
    'summary': Text(shape=(), dtype=tf.string),
})
@article{gliwa2019samsum,
  title={SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization},
  author={Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander},
  journal={arXiv preprint arXiv:1911.12237},
  year={2019}
}