  • Description:

A benchmark corpus to be used for measuring progress in statistical language modeling. This has almost one billion words in the training data.

Split Examples
'test' 306,688
'train' 30,301,028
  • Feature structure:
    'text': Text(shape=(), dtype=string),
  • Feature documentation:
Feature Class Shape Dtype Description
text Text string
  • Citation:
