xsum (Manual download)

Extreme Summarization (XSum) Dataset.

There are two features: - document: Input news article. - summary: One sentence summary of the article.

This data need to manaully downloaded and extracted as described in https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/README.md. The folder 'xsum-extracts-from-downloads' need to be compressed as 'xsum-extracts-from-downloads.tar.gz' and put in manually downloaded folder.

WARNING: This dataset requires you to download the source data manually into manual_dir (defaults to ~/tensorflow_datasets/manual/xsum/): Detailed download instructions (which require running a custom script) are here: https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/README.md#running-the-download-and-extraction-scripts Afterwards, please put xsum-extracts-from-downloads.tar.gz file in the manual_dir.

Features

FeaturesDict({
    'document': Text(shape=(), dtype=tf.string),
    'summary': Text(shape=(), dtype=tf.string),
})

Statistics

Split Examples
ALL 226,183
TRAIN 203,577
VALIDATION 11,305
TEST 11,301

Homepage

Supervised keys (for as_supervised=True)

(u'document', u'summary')

Citation

@article{Narayan2018DontGM,
  title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},
  author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.08745}
}