TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

gov_report

Description:

Government report dataset consists of reports written by government research agencies including Congressional Research Service and U.S. Government Accountability Office.

Additional Documentation: Explore on Papers With Code
Homepage: https://gov-report-data.github.io/
Source code: tfds.summarization.gov_report.GovReport
Versions:
- 1.0.0 (default): Initial release.
Download size: 320.59 MiB
Auto-cached (documentation): No
Figure (tfds.show_examples): Not supported.
Citation:

@inproceedings{
anonymous2022efficiently,
title={Efficiently Modeling Long Sequences with Structured State Spaces},
author={Anonymous},
booktitle={Submitted to The Tenth International Conference on Learning Representations },
year={2022},
url={https://openreview.net/forum?id=uYLFoz1vlAC},
note={under review}
}

gov_report/crs_whitespace (default config)

Config description: CRS report with summary. Structures flattened and joined by whitespace. This is the format used by original paper
Dataset size: 349.76 MiB
Splits:

Split	Examples
`'test'`	362
`'train'`	6,514
`'validation'`	362

Feature structure:

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'reports': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
id	Text	string
released_date	Text	string
reports	Text	string
summary	Text	string
title	Text	string

Supervised keys (See as_supervised doc): ('reports', 'summary')
Examples (tfds.as_dataframe):

gov_report/gao_whitespace

Config description: GAO report with highlight Structures flattened and joined by whitespace. This is the format used by original paper
Dataset size: 690.24 MiB
Splits:

Split	Examples
`'test'`	611
`'train'`	11,005
`'validation'`	612

Feature structure:

FeaturesDict({
    'fastfact': Text(shape=(), dtype=string),
    'highlight': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'published_date': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'report': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
fastfact	Text	string
highlight	Text	string
id	Text	string
published_date	Text	string
released_date	Text	string
report	Text	string
title	Text	string
url	Text	string

Supervised keys (See as_supervised doc): ('report', 'highlight')
Examples (tfds.as_dataframe):

gov_report/crs_html

Config description: CRS report with summary. Structures flattened and joined by newline while add html tags. Tags are only added for secition_title in a format like <h2>xxx<h2>.
Dataset size: 351.25 MiB
Splits:

Split	Examples
`'test'`	362
`'train'`	6,514
`'validation'`	362

Feature structure:

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'reports': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
id	Text	string
released_date	Text	string
reports	Text	string
summary	Text	string
title	Text	string

Supervised keys (See as_supervised doc): ('reports', 'summary')
Examples (tfds.as_dataframe):

gov_report/gao_html

Config description: GAO report with highlight Structures flattened and joined by newline while add html tags. Tags are only added for secition_title in a format like <h2>xxx<h2>.
Dataset size: 692.72 MiB
Splits:

Split	Examples
`'test'`	611
`'train'`	11,005
`'validation'`	612

Feature structure:

FeaturesDict({
    'fastfact': Text(shape=(), dtype=string),
    'highlight': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'published_date': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'report': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
fastfact	Text	string
highlight	Text	string
id	Text	string
published_date	Text	string
released_date	Text	string
report	Text	string
title	Text	string
url	Text	string

Supervised keys (See as_supervised doc): ('report', 'highlight')
Examples (tfds.as_dataframe):

gov_report/crs_json

Config description: CRS report with summary. Structures represented as raw json.
Dataset size: 361.92 MiB
Splits:

Split	Examples
`'test'`	362
`'train'`	6,514
`'validation'`	362

Feature structure:

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'reports': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
id	Text	string
released_date	Text	string
reports	Text	string
summary	Text	string
title	Text	string

Supervised keys (See as_supervised doc): ('reports', 'summary')
Examples (tfds.as_dataframe):

gov_report/gao_json

Config description: GAO report with highlight Structures represented as raw json.
Dataset size: 712.82 MiB
Splits:

Split	Examples
`'test'`	611
`'train'`	11,005
`'validation'`	612

Feature structure:

FeaturesDict({
    'fastfact': Text(shape=(), dtype=string),
    'highlight': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'published_date': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'report': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
fastfact	Text	string
highlight	Text	string
id	Text	string
published_date	Text	string
released_date	Text	string
report	Text	string
title	Text	string
url	Text	string

Supervised keys (See as_supervised doc): ('report', 'highlight')
Examples (tfds.as_dataframe):