gov_report

  • Description:

Government report dataset consists of reports written by government research agencies including Congressional Research Service and U.S. Government Accountability Office.

@inproceedings{
anonymous2022efficiently,
title={Efficiently Modeling Long Sequences with Structured State Spaces},
author={Anonymous},
booktitle={Submitted to The Tenth International Conference on Learning Representations },
year={2022},
url={https://openreview.net/forum?id=uYLFoz1vlAC},
note={under review}
}

gov_report/crs_whitespace (default config)

  • Config description: CRS report with summary. Structures flattened and joined by whitespace. This is the format used by original paper

  • Dataset size: 349.76 MiB

  • Splits:

Split Examples
'test' 362
'train' 6,514
'validation' 362
  • Feature structure:
FeaturesDict({
    'id': Text(shape=(), dtype=tf.string),
    'released_date': Text(shape=(), dtype=tf.string),
    'reports': Text(shape=(), dtype=tf.string),
    'summary': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
id Text tf.string
released_date Text tf.string
reports Text tf.string
summary Text tf.string
title Text tf.string

gov_report/gao_whitespace

  • Config description: GAO report with highlight Structures flattened and joined by whitespace. This is the format used by original paper

  • Dataset size: 690.24 MiB

  • Splits:

Split Examples
'test' 611
'train' 11,005
'validation' 612
  • Feature structure:
FeaturesDict({
    'fastfact': Text(shape=(), dtype=tf.string),
    'highlight': Text(shape=(), dtype=tf.string),
    'id': Text(shape=(), dtype=tf.string),
    'published_date': Text(shape=(), dtype=tf.string),
    'released_date': Text(shape=(), dtype=tf.string),
    'report': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
    'url': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
fastfact Text tf.string
highlight Text tf.string
id Text tf.string
published_date Text tf.string
released_date Text tf.string
report Text tf.string
title Text tf.string
url Text tf.string

gov_report/crs_html

  • Config description: CRS report with summary. Structures flattened and joined by newline while add html tags. Tags are only added for secition_title in a format like <h2>xxx<h2>.

  • Dataset size: 351.25 MiB

  • Splits:

Split Examples
'test' 362
'train' 6,514
'validation' 362
  • Feature structure:
FeaturesDict({
    'id': Text(shape=(), dtype=tf.string),
    'released_date': Text(shape=(), dtype=tf.string),
    'reports': Text(shape=(), dtype=tf.string),
    'summary': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
id Text tf.string
released_date Text tf.string
reports Text tf.string
summary Text tf.string
title Text tf.string

gov_report/gao_html

  • Config description: GAO report with highlight Structures flattened and joined by newline while add html tags. Tags are only added for secition_title in a format like <h2>xxx<h2>.

  • Dataset size: 692.72 MiB

  • Splits:

Split Examples
'test' 611
'train' 11,005
'validation' 612
  • Feature structure:
FeaturesDict({
    'fastfact': Text(shape=(), dtype=tf.string),
    'highlight': Text(shape=(), dtype=tf.string),
    'id': Text(shape=(), dtype=tf.string),
    'published_date': Text(shape=(), dtype=tf.string),
    'released_date': Text(shape=(), dtype=tf.string),
    'report': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
    'url': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
fastfact Text tf.string
highlight Text tf.string
id Text tf.string
published_date Text tf.string
released_date Text tf.string
report Text tf.string
title Text tf.string
url Text tf.string

gov_report/crs_json

  • Config description: CRS report with summary. Structures represented as raw json.

  • Dataset size: 361.92 MiB

  • Splits:

Split Examples
'test' 362
'train' 6,514
'validation' 362
  • Feature structure:
FeaturesDict({
    'id': Text(shape=(), dtype=tf.string),
    'released_date': Text(shape=(), dtype=tf.string),
    'reports': Text(shape=(), dtype=tf.string),
    'summary': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
id Text tf.string
released_date Text tf.string
reports Text tf.string
summary Text tf.string
title Text tf.string

gov_report/gao_json

  • Config description: GAO report with highlight Structures represented as raw json.

  • Dataset size: 712.82 MiB

  • Splits:

Split Examples
'test' 611
'train' 11,005
'validation' 612
  • Feature structure:
FeaturesDict({
    'fastfact': Text(shape=(), dtype=tf.string),
    'highlight': Text(shape=(), dtype=tf.string),
    'id': Text(shape=(), dtype=tf.string),
    'published_date': Text(shape=(), dtype=tf.string),
    'released_date': Text(shape=(), dtype=tf.string),
    'report': Text(shape=(), dtype=tf.string),
    'title': Text(shape=(), dtype=tf.string),
    'url': Text(shape=(), dtype=tf.string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
fastfact Text tf.string
highlight Text tf.string
id Text tf.string
published_date Text tf.string
released_date Text tf.string
report Text tf.string
title Text tf.string
url Text tf.string