sentiment140

  • Description:

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

The data is a CSV with emoticons removed. Data file format has 6 fields:

  1. the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
  2. the id of the tweet (2087)
  3. the date of the tweet (Sat May 16 23:58:44 UTC 2009)
  4. the query (lyx). If there is no query, then this value is NO_QUERY.
  5. the user that tweeted (robotickilldozr)
  6. the text of the tweet (Lyx is cool)

For more information, refer to the paper Twitter Sentiment Classification with Distant Supervision at https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf

Split Examples
'test' 498
'train' 1,600,000
  • Feature structure:
FeaturesDict({
    'date': Text(shape=(), dtype=string),
    'polarity': int32,
    'query': Text(shape=(), dtype=string),
    'text': Text(shape=(), dtype=string),
    'user': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
date Text string
polarity Tensor int32
query Text string
text Text string
user Text string
  • Citation:
@ONLINE {Sentiment140,
    author = "Go, Alec and Bhayani, Richa and Huang, Lei",
    title  = "Twitter Sentiment Classification using Distant Supervision",
    year   = "2009",
    url    = "http://help.sentiment140.com/home"
}