Google I/O returns May 18-20! Reserve space and build your schedule Register now


  • Description:

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

The data is a CSV with emoticons removed. Data file format has 6 fields:

  1. the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
  2. the id of the tweet (2087)
  3. the date of the tweet (Sat May 16 23:58:44 UTC 2009)
  4. the query (lyx). If there is no query, then this value is NO_QUERY.
  5. the user that tweeted (robotickilldozr)
  6. the text of the tweet (Lyx is cool)

For more information, refer to the paper Twitter Sentiment Classification with Distant Supervision at

Split Examples
'test' 498
'train' 1,600,000
  • Features:
    'date': Text(shape=(), dtype=tf.string),
    'polarity': tf.int32,
    'query': Text(shape=(), dtype=tf.string),
    'text': Text(shape=(), dtype=tf.string),
    'user': Text(shape=(), dtype=tf.string),
  • Citation:
@ONLINE {Sentiment140,
    author = "Go, Alec and Bhayani, Richa and Huang, Lei",
    title  = "Twitter Sentiment Classification using Distant Supervision",
    year   = "2009",
    url    = ""