Join TensorFlow at Google I/O, May 11-12 Register now

trivia_qa

  • Description:

TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.

Split Examples
'test' 10,832
'train' 87,622
'validation' 11,313
  • Feature structure:
FeaturesDict({
    'answer': FeaturesDict({
        'aliases': Sequence(Text(shape=(), dtype=tf.string)),
        'matched_wiki_entity_name': Text(shape=(), dtype=tf.string),
        'normalized_aliases': Sequence(Text(shape=(), dtype=tf.string)),
        'normalized_matched_wiki_entity_name': Text(shape=(), dtype=tf.string),
        'normalized_value': Text(shape=(), dtype=tf.string),
        'type': Text(shape=(), dtype=tf.string),
        'value': Text(shape=(), dtype=tf.string),
    }),
    'entity_pages': Sequence({
        'doc_source': Text(shape=(), dtype=tf.string),
        'filename': Text(shape=(), dtype=tf.string),
        'title': Text(shape=(), dtype=tf.string),
        'wiki_context': Text(shape=(), dtype=tf.string),
    }),
    'question': Text(shape=(), dtype=tf.string),
    'question_id': Text(shape=(), dtype=tf.string),
    'question_source': Text(shape=(), dtype=tf.string),
    'search_results': Sequence({
        'description': Text(shape=(), dtype=tf.string),
        'filename': Text(shape=(), dtype=tf.string),
        'rank': tf.int32,
        'search_context': Text(shape=(), dtype=tf.string),
        'title': Text(shape=(), dtype=tf.string),
        'url': Text(shape=(), dtype=tf.string),
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
answer FeaturesDict
answer/aliases Sequence(Text) (None,) tf.string
answer/matched_wiki_entity_name Text tf.string
answer/normalized_aliases Sequence(Text) (None,) tf.string
answer/normalized_matched_wiki_entity_name Text tf.string
answer/normalized_value Text tf.string
answer/type Text tf.string
answer/value Text tf.string
entity_pages Sequence
entity_pages/doc_source Text tf.string
entity_pages/filename Text tf.string
entity_pages/title Text tf.string
entity_pages/wiki_context Text tf.string
question Text tf.string
question_id Text tf.string
question_source Text tf.string
search_results Sequence
search_results/description Text tf.string
search_results/filename Text tf.string
search_results/rank Tensor tf.int32
search_results/search_context Text tf.string
search_results/title Text tf.string
search_results/url Text tf.string
@article{2017arXivtriviaqa,
       author = { {Joshi}, Mandar and {Choi}, Eunsol and {Weld},
                 Daniel and {Zettlemoyer}, Luke},
        title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
      journal = {arXiv e-prints},
         year = 2017,
          eid = {arXiv:1705.03551},
        pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
       eprint = {1705.03551},
}

trivia_qa/rc (default config)

  • Config description: Question-answer pairs where all documents for a given question contain the answer string(s). Includes context from Wikipedia and search results.

  • Examples (tfds.as_dataframe):

trivia_qa/rc.nocontext

  • Config description: Question-answer pairs where all documents for a given question contain the answer string(s).

  • Examples (tfds.as_dataframe):

trivia_qa/unfiltered

  • Config description: 110k question-answer pairs for open domain QA where not all documents for a given question contain the answer string(s). This makes the unfiltered dataset more appropriate for IR-style QA. Includes context from Wikipedia and search results.

  • Examples (tfds.as_dataframe):

trivia_qa/unfiltered.nocontext

  • Config description: 110k question-answer pairs for open domain QA where not all documents for a given question contain the answer string(s). This makes the unfiltered dataset more appropriate for IR-style QA.

  • Examples (tfds.as_dataframe):