- Description:
The Answer Equivalence Dataset contains human ratings on model predictions from several models on the SQuAD dataset. The ratings establish whether the predicted answer is 'equivalent' to the gold answer (taking into account both question and context).
More specifically, by 'equivalent' we mean that the predicted answer contains at least the same information as the gold answer and does not add superfluous information. The dataset contains annotations for: * predictions from BiDAF on SQuAD dev * predictions from XLNet on SQuAD dev * predictions from Luke on SQuAD dev * predictions from Albert on SQuAD training, dev and test examples
Homepage: https://github.com/google-research-datasets/answer-equivalence-dataset
Source code:
tfds.datasets.answer_equivalence.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
45.86 MiB
Dataset size:
47.24 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'ae_dev' |
4,446 |
'ae_test' |
9,724 |
'dev_bidaf' |
7,522 |
'dev_luke' |
4,590 |
'dev_xlnet' |
7,932 |
'train' |
9,090 |
- Feature structure:
FeaturesDict({
'candidate': Text(shape=(), dtype=string),
'context': Text(shape=(), dtype=string),
'gold_index': int32,
'qid': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
'question_1': ClassLabel(shape=(), dtype=int64, num_classes=3),
'question_2': ClassLabel(shape=(), dtype=int64, num_classes=3),
'question_3': ClassLabel(shape=(), dtype=int64, num_classes=3),
'question_4': ClassLabel(shape=(), dtype=int64, num_classes=3),
'reference': Text(shape=(), dtype=string),
'score': float32,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
candidate | Text | string | ||
context | Text | string | ||
gold_index | Tensor | int32 | ||
qid | Text | string | ||
question | Text | string | ||
question_1 | ClassLabel | int64 | ||
question_2 | ClassLabel | int64 | ||
question_3 | ClassLabel | int64 | ||
question_4 | ClassLabel | int64 | ||
reference | Text | string | ||
score | Tensor | float32 |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@article{bulian-etal-2022-tomayto,
title={Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation},
author={Jannis Bulian and Christian Buck and Wojciech Gajewski and Benjamin Boerschinger and Tal Schuster},
year={2022},
eprint={2202.07654},
archivePrefix={arXiv},
primaryClass={cs.CL}
}