- Description:
Reddit dataset, where TIFU denotes the name of subbreddit /r/tifu. As defined in the publication, styel "short" uses title as summary and "long" uses tldr as summary.
Features includes: - document: post text without tldr. - tldr: tldr line. - title: trimmed title without tldr. - ups: upvotes. - score: score. - num_comments: number of comments. - upvote_ratio: upvote ratio.
Homepage: https://github.com/ctr4si/MMN
Source code:
tfds.summarization.RedditTifu
Versions:
1.1.0
(default): No release notes.
Download size:
639.54 MiB
Dataset size:
Unknown size
Auto-cached (documentation): Unknown
Feature structure:
FeaturesDict({
'documents': Text(shape=(), dtype=tf.string),
'num_comments': tf.float32,
'score': tf.float32,
'title': Text(shape=(), dtype=tf.string),
'tldr': Text(shape=(), dtype=tf.string),
'ups': tf.float32,
'upvote_ratio': tf.float32,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
documents | Text | tf.string | ||
num_comments | Tensor | tf.float32 | ||
score | Tensor | tf.float32 | ||
title | Text | tf.string | ||
tldr | Text | tf.string | ||
ups | Tensor | tf.float32 | ||
upvote_ratio | Tensor | tf.float32 |
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@misc{kim2018abstractive,
title={Abstractive Summarization of Reddit Posts with Multi-level Memory Networks},
author={Byeongchang Kim and Hyunwoo Kim and Gunhee Kim},
year={2018},
eprint={1811.00783},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
reddit_tifu/short (default config)
Config description: Using title as summary.
Splits:
Split | Examples |
---|---|
'train' |
79,740 |
- Supervised keys (See
as_supervised
doc):('documents', 'title')
reddit_tifu/long
Config description: Using TLDR as summary.
Splits:
Split | Examples |
---|---|
'train' |
42,139 |
- Supervised keys (See
as_supervised
doc):('documents', 'tldr')