TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

sentiment140

Description:

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

The data is a CSV with emoticons removed. Data file format has 6 fields:

the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
the id of the tweet (2087)
the date of the tweet (Sat May 16 23:58:44 UTC 2009)
the query (lyx). If there is no query, then this value is NO_QUERY.
the user that tweeted (robotickilldozr)
the text of the tweet (Lyx is cool)

For more information, refer to the paper Twitter Sentiment Classification with Distant Supervision at https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf

Additional Documentation: Explore on Papers With Code
Homepage: http://help.sentiment140.com/home
Source code: tfds.datasets.sentiment140.Builder
Versions:
- 1.0.0 (default): No release notes.
Download size: 77.59 MiB
Dataset size: 305.13 MiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	498
`'train'`	1,600,000

Feature structure:

FeaturesDict({
    'date': Text(shape=(), dtype=string),
    'polarity': int32,
    'query': Text(shape=(), dtype=string),
    'text': Text(shape=(), dtype=string),
    'user': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
date	Text	string
polarity	Tensor	int32
query	Text	string
text	Text	string
user	Text	string

Supervised keys (See as_supervised doc): ('text', 'polarity')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@ONLINE {Sentiment140,
    author = "Go, Alec and Bhayani, Richa and Huang, Lei",
    title  = "Twitter Sentiment Classification using Distant Supervision",
    year   = "2009",
    url    = "http://help.sentiment140.com/home"
}