TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

imdb_reviews

Description:

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Additional Documentation: Explore on Papers With Code
Config description: Plain text
Homepage: http://ai.stanford.edu/~amaas/data/sentiment/
Source code: tfds.datasets.imdb_reviews.Builder
Versions:
- 1.0.0 (default): New split API (https://tensorflow.org/datasets/splits)
Download size: 80.23 MiB
Dataset size: 129.83 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	25,000
`'train'`	25,000
`'unsupervised'`	50,000

Feature structure:

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'text': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
label	ClassLabel	int64
text	Text	string

Supervised keys (See as_supervised doc): ('text', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@InProceedings{maas-EtAl:2011:ACL-HLT2011,
  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {http://www.aclweb.org/anthology/P11-1015}
}

imdb_reviews Stay organized with collections Save and categorize content based on your preferences.

imdb_reviews/plain_text (default config)

imdb_reviews