Missed TensorFlow World? Check out the recap. Learn more

reddit_tifu

Reddit dataset, where TIFU denotes the name of subbreddit /r/tifu. As defined in the publication, styel "short" uses title as summary and "long" uses tldr as summary.

Features includes: - document: post text without tldr. - tldr: tldr line. - title: trimmed title without tldr. - ups: upvotes. - score: score. - num_comments: number of comments. - upvote_ratio: upvote ratio.

reddit_tifu is configured with tfds.summarization.reddit_tifu.RedditTifuConfig and has the following configurations predefined (defaults to the first one):

  • short (v1.1.0) (Size: 639.54 MiB): Using title as summary.

  • long (v1.1.0) (Size: 639.54 MiB): Using TLDR as summary.

reddit_tifu/short

Using title as summary.

Versions:

  • 1.1.0 (default):

Statistics

Split Examples
ALL 79,740
TRAIN 79,740

Features

FeaturesDict({
    'documents': Text(shape=(), dtype=tf.string),
    'num_comments': Tensor(shape=[], dtype=tf.float32),
    'score': Tensor(shape=[], dtype=tf.float32),
    'title': Text(shape=(), dtype=tf.string),
    'tldr': Text(shape=(), dtype=tf.string),
    'ups': Tensor(shape=[], dtype=tf.float32),
    'upvote_ratio': Tensor(shape=[], dtype=tf.float32),
})

Homepage

Supervised keys (for as_supervised=True)

(u'documents', u'title')

reddit_tifu/long

Using TLDR as summary.

Versions:

  • 1.1.0 (default):

Statistics

Split Examples
ALL 42,139
TRAIN 42,139

Features

FeaturesDict({
    'documents': Text(shape=(), dtype=tf.string),
    'num_comments': Tensor(shape=[], dtype=tf.float32),
    'score': Tensor(shape=[], dtype=tf.float32),
    'title': Text(shape=(), dtype=tf.string),
    'tldr': Text(shape=(), dtype=tf.string),
    'ups': Tensor(shape=[], dtype=tf.float32),
    'upvote_ratio': Tensor(shape=[], dtype=tf.float32),
})

Homepage

Supervised keys (for as_supervised=True)

(u'documents', u'tldr')

Citation

@misc{kim2018abstractive,
    title={Abstractive Summarization of Reddit Posts with Multi-level Memory Networks},
    author={Byeongchang Kim and Hyunwoo Kim and Gunhee Kim},
    year={2018},
    eprint={1811.00783},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}