TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

xsum

Description:

Extreme Summarization (XSum) Dataset.

There are two features: - document: Input news article. - summary: One sentence summary of the article.

This data need to manaully downloaded and extracted as described in https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/README.md The folder 'xsum-extracts-from-downloads' need to be compressed as 'xsum-extracts-from-downloads.tar.gz' and put in manually downloaded folder.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset
Source code: tfds.summarization.Xsum
Versions:
- 1.0.0: Dataset without cleaning.
- 1.1.0 (default): Removes web contents.
Download size: 2.59 MiB
Dataset size: 512.03 MiB
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
Detailed download instructions (which require running a custom script) are here: https://github.com/EdinburghNLP/XSum/blob/master/XSum-Dataset/README.md#running-the-download-and-extraction-scripts Afterwards, please put xsum-extracts-from-downloads.tar.gz file in the manual_dir.
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	11,301
`'train'`	203,577
`'validation'`	11,305

Feature structure:

FeaturesDict({
    'document': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
document	Text	string
summary	Text	string

Supervised keys (See as_supervised doc): ('document', 'summary')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{Narayan2018DontGM,
  title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},
  author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.08745}
}