reddit_tifu

  • Description:

Reddit dataset, where TIFU denotes the name of subbreddit /r/tifu. As defined in the publication, styel "short" uses title as summary and "long" uses tldr as summary.

Features includes:

  • document: post text without tldr.
  • tldr: tldr line.
  • title: trimmed title without tldr.
  • ups: upvotes.
  • score: score.
  • num_comments: number of comments.
  • upvote_ratio: upvote ratio.

  • Homepage: https://github.com/ctr4si/MMN

  • Source code: tfds.summarization.RedditTifu

  • Versions:

    • 1.1.0 (default): No release notes.
  • Download size: 639.54 MiB

  • Dataset size: Unknown size

  • Auto-cached (documentation): Unknown

  • Features:

FeaturesDict({
    'documents': Text(shape=(), dtype=tf.string),
    'num_comments': tf.float32,
    'score': tf.float32,
    'title': Text(shape=(), dtype=tf.string),
    'tldr': Text(shape=(), dtype=tf.string),
    'ups': tf.float32,
    'upvote_ratio': tf.float32,
})
  • Citation:
@misc{kim2018abstractive,
    title={Abstractive Summarization of Reddit Posts with Multi-level Memory Networks},
    author={Byeongchang Kim and Hyunwoo Kim and Gunhee Kim},
    year={2018},
    eprint={1811.00783},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

reddit_tifu/short (default config)

  • Config description: Using title as summary.

  • Splits:

Split Examples
'train' 79,740

reddit_tifu/long

  • Config description: Using TLDR as summary.

  • Splits:

Split Examples
'train' 42,139