TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

sci_tail

Description:

The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis. Information retrieval is used to obtain relevant text from a large text corpus of web sentences, and these sentences are used as a premise P. The annotation of such premise-hypothesis pair is crowdsourced as supports (entails) or not (neutral), in order to create the SciTail dataset. The dataset contains 27,026 examples with 10,101 examples with entails label and 16,925 examples with neutral label.

Additional Documentation: Explore on Papers With Code
Homepage: https://allenai.org/data/scitail
Source code: tfds.datasets.sci_tail.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 13.52 MiB
Dataset size: 6.01 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	2,126
`'train'`	23,097
`'validation'`	1,304

Feature structure:

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'premise': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
hypothesis	Text	string
label	ClassLabel	int64
premise	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{khot2018scitail,
    title={Scitail: A textual entailment dataset from science question answering},
    author={Khot, Tushar and Sabharwal, Ashish and Clark, Peter},
    booktitle={Proceedings of the 32th AAAI Conference on Artificial Intelligence (AAAI 2018)},
    url = "http://ai2-website.s3.amazonaws.com/publications/scitail-aaai-2018_cameraready.pdf",
    year={2018}
}