TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

lambada

Description:

The LAMBADA dataset evaluates the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word

Additional Documentation: Explore on Papers With Code
Homepage: https://zenodo.org/record/2630551#.X4Xzn5NKjUI
Source code: tfds.datasets.lambada.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 319.03 MiB
Dataset size: 3.49 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	5,153
`'train'`	4,869

Feature structure:

FeaturesDict({
    'passage': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	FeaturesDict
passage	Text		string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{paperno-etal-2016-lambada,
    title = "The {LAMBADA} dataset: Word prediction requiring a broad discourse context",
    author = "Paperno, Denis  and
      Kruszewski, Germ{\'a}n  and
      Lazaridou, Angeliki  and
      Pham, Ngoc Quan  and
      Bernardi, Raffaella  and
      Pezzelle, Sandro  and
      Baroni, Marco  and
      Boleda, Gemma  and
      Fern{\'a}ndez, Raquel",
    booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2016",
    address = "Berlin, Germany",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P16-1144",
    doi = "10.18653/v1/P16-1144",
    pages = "1525--1534",
}