- Description:
Extreme Summarization (XSum) Dataset.
There are two features: - document: Input news article. - summary: One sentence summary of the article.
This data need to manaully downloaded and extracted as described in https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/README.md The folder 'xsum-extracts-from-downloads' need to be compressed as 'xsum-extracts-from-downloads.tar.gz' and put in manually downloaded folder.
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset
Source code:
tfds.summarization.Xsum
Versions:
1.0.0
: Dataset without cleaning.1.1.0
(default): Removes web contents.
Download size:
2.59 MiB
Dataset size:
512.03 MiB
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Detailed download instructions (which require running a custom script) are here: https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/README.md#running-the-download-and-extraction-scripts Afterwards, please put xsum-extracts-from-downloads.tar.gz file in the manual_dir.Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
11,301 |
'train' |
203,577 |
'validation' |
11,305 |
- Feature structure:
FeaturesDict({
'document': Text(shape=(), dtype=string),
'summary': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
document | Text | string | ||
summary | Text | string |
Supervised keys (See
as_supervised
doc):('document', 'summary')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@article{Narayan2018DontGM,
title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},
author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},
journal={ArXiv},
year={2018},
volume={abs/1808.08745}
}