asqa

Stay organized with collections Save and categorize content based on your preferences.

  • Description:

ASQA is the first long-form question answering dataset that focuses on ambiguous factoid questions. Different from previous long-form answers datasets, each question is annotated with both long-form answers and extractive question-answer pairs, which should be answerable by the generated passage. A generated long-form answer will be evaluated using both ROUGE and QA accuracy. We showed that these evaluation metrics correlated with human judgment well. In this repostory we release the ASQA dataset, together with the evaluation code: <a href="https://github.com/google-research/language/tree/master/language/asqa">https://github.com/google-research/language/tree/master/language/asqa</a>

Split Examples
'dev' 948
'train' 4,353
  • Feature structure:
FeaturesDict({
    'ambiguous_question': Text(shape=(), dtype=tf.string),
    'annotations': Sequence({
        'knowledge': Sequence({
            'content': Text(shape=(), dtype=tf.string),
            'wikipage': Text(shape=(), dtype=tf.string),
        }),
        'long_answer': Text(shape=(), dtype=tf.string),
    }),
    'qa_pairs': Sequence({
        'context': Text(shape=(), dtype=tf.string),
        'question': Text(shape=(), dtype=tf.string),
        'short_answers': Sequence(Text(shape=(), dtype=tf.string)),
        'wikipage': Text(shape=(), dtype=tf.string),
    }),
    'sample_id': tf.int32,
    'wikipages': Sequence({
        'title': Text(shape=(), dtype=tf.string),
        'url': Text(shape=(), dtype=tf.string),
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
ambiguous_question Text tf.string Disambiguated question from AmbigQA.
annotations Sequence Long-form answers to the ambiguous question constructed by ASQA annotators.
annotations/knowledge Sequence List of additional knowledge pieces.
annotations/knowledge/content Text tf.string A passage from Wikipedia.
annotations/knowledge/wikipage Text tf.string Title of the Wikipedia page the passage was taken from.
annotations/long_answer Text tf.string Annotation.
qa_pairs Sequence Q&A pairs from AmbigQA which are used for disambiguation.
qa_pairs/context Text tf.string Additional context provided.
qa_pairs/question Text tf.string
qa_pairs/short_answers Sequence(Text) (None,) tf.string List of short answers from AmbigQA.
qa_pairs/wikipage Text tf.string Title of the Wikipedia page the additional context was taken from.
sample_id Tensor tf.int32
wikipages Sequence List of Wikipedia pages visited by AmbigQA annotators.
wikipages/title Text tf.string Title of the Wikipedia page.
wikipages/url Text tf.string Link to the Wikipedia page.
  • Citation:
@misc{https://doi.org/10.48550/arxiv.2204.06092,
doi = {10.48550/ARXIV.2204.06092},
url = {https://arxiv.org/abs/2204.06092},
author = {Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {ASQA: Factoid Questions Meet Long-Form Answers},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}