• Description:

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

The data is a CSV with emoticons removed. Data file format has 6 fields:

  1. the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
  2. the id of the tweet (2087)
  3. the date of the tweet (Sat May 16 23:58:44 UTC 2009)
  4. the query (lyx). If there is no query, then this value is NO_QUERY.
  5. the user that tweeted (robotickilldozr)
  6. the text of the tweet (Lyx is cool)

For more information, refer to the paper Twitter Sentiment Classification with Distant Supervision at

Split Examples
'test' 498
'train' 1,600,000
  • Feature structure:
    'date': Text(shape=(), dtype=string),
    'polarity': int32,
    'query': Text(shape=(), dtype=string),
    'text': Text(shape=(), dtype=string),
    'user': Text(shape=(), dtype=string),
  • Feature documentation:
Feature Class Shape Dtype Description
date Text string
polarity Tensor int32
query Text string
text Text string
user Text string
  • Citation:
@ONLINE {Sentiment140,
    author = "Go, Alec and Bhayani, Richa and Huang, Lei",
    title  = "Twitter Sentiment Classification using Distant Supervision",
    year   = "2009",
    url    = ""