- Description:
The LAMBADA dataset evaluates the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word
Additional Documentation: Explore on Papers With Code
Source code:
tfds.datasets.lambada.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
319.03 MiB
Dataset size:
3.49 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'test' |
5,153 |
'train' |
4,869 |
- Feature structure:
FeaturesDict({
'passage': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
passage | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{paperno-etal-2016-lambada,
title = "The {LAMBADA} dataset: Word prediction requiring a broad discourse context",
author = "Paperno, Denis and
Kruszewski, Germ{\'a}n and
Lazaridou, Angeliki and
Pham, Ngoc Quan and
Bernardi, Raffaella and
Pezzelle, Sandro and
Baroni, Marco and
Boleda, Gemma and
Fern{\'a}ndez, Raquel",
booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2016",
address = "Berlin, Germany",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P16-1144",
doi = "10.18653/v1/P16-1144",
pages = "1525--1534",
}