TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

opinion_abstracts

Description:

There are two sub datasets:

(1) RottenTomatoes: The movie critics and consensus crawled from http://rottentomatoes.com/ It has fields of "_movie_name", "_movie_id", "_critics", and "_critic_consensus".

(2) IDebate: The arguments crawled from http://idebate.org/ It has fields of "_debate_name", "_debate_id", "_claim", "_claim_id", "_argument_sentences".

Homepage: https://web.eecs.umich.edu/~wangluxy/data.html
Source code: tfds.datasets.opinion_abstracts.Builder
Versions:
- 1.0.0 (default): No release notes.
Download size: 20.08 MiB
Auto-cached (documentation): Yes
Figure (tfds.show_examples): Not supported.
Citation:

@inproceedings{wang-ling-2016-neural,
    title = "Neural Network-Based Abstract Generation for Opinions and Arguments",
    author = "Wang, Lu  and
      Ling, Wang",
    booktitle = "Proceedings of the 2016 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2016",
    address = "San Diego, California",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N16-1007",
    doi = "10.18653/v1/N16-1007",
    pages = "47--57",
}

opinion_abstracts/rotten_tomatoes (default config)

Config description: Professional critics and consensus of 3,731 movies.
Dataset size: 50.10 MiB
Splits:

Split	Examples
`'train'`	3,731

Feature structure:

FeaturesDict({
    '_critic_consensus': string,
    '_critics': Sequence({
        'key': string,
        'value': string,
    }),
    '_movie_id': string,
    '_movie_name': string,
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
_critic_consensus	Tensor	string
_critics	Sequence
_critics/key	Tensor	string
_critics/value	Tensor	string
_movie_id	Tensor	string
_movie_name	Tensor	string

Supervised keys (See as_supervised doc): ('_critics', '_critic_consensus')
Examples (tfds.as_dataframe):

opinion_abstracts/idebate

Config description: 2,259 claims for 676 debates.
Dataset size: 3.15 MiB
Splits:

Split	Examples
`'train'`	2,259

Feature structure:

FeaturesDict({
    '_argument_sentences': Sequence({
        'key': string,
        'value': string,
    }),
    '_claim': string,
    '_claim_id': string,
    '_debate_name': string,
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
_argument_sentences	Sequence
_argument_sentences/key	Tensor	string
_argument_sentences/value	Tensor	string
_claim	Tensor	string
_claim_id	Tensor	string
_debate_name	Tensor	string

Supervised keys (See as_supervised doc): ('_argument_sentences', '_claim')
Examples (tfds.as_dataframe):