TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

istella

Description:

The Istella datasets are three large-scale Learning-to-Rank datasets released by Istella. Each dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.

The dataset contains three versions:

main ("Istella LETOR"): Containing 10,454,629 query-document pairs.
s ("Istella-S LETOR"): Containing 3,408,630 query-document pairs.
x ("Istella-X LETOR"): Containing 26,791,447 query-document pairs.

You can specify whether to use the main, s or x version of the dataset as follows:

ds = tfds.load("istella/main")
ds = tfds.load("istella/s")
ds = tfds.load("istella/x")

If only istella is specified, the istella/main option is selected by default:

# This is the same as `tfds.load("istella/main")`
ds = tfds.load("istella")

Homepage: http://quickrank.isti.cnr.it/istella-dataset/
Source code: tfds.ranking.istella.Istella
Versions:
- 1.0.0: Initial release.
- 1.0.1: Fix serialization to support float64.
- 1.1.0: Bundle features into a single 'float_features' feature.
- 1.2.0 (default): Add query and document identifiers.
Auto-cached (documentation): No
Feature structure:

FeaturesDict({
    'doc_id': Tensor(shape=(None,), dtype=int64),
    'float_features': Tensor(shape=(None, 220), dtype=float64),
    'label': Tensor(shape=(None,), dtype=float64),
    'query_id': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
doc_id	Tensor	(None,)	int64
float_features	Tensor	(None, 220)	float64
label	Tensor	(None,)	float64
query_id	Text		string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{10.1145/2987380,
  author = {Dato, Domenico and Lucchese, Claudio and Nardini, Franco Maria and Orlando, Salvatore and Perego, Raffaele and Tonellotto, Nicola and Venturini, Rossano},
  title = {Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees},
  year = {2016},
  publisher = {ACM},
  address = {New York, NY, USA},
  volume = {35},
  number = {2},
  issn = {1046-8188},
  url = {https://doi.org/10.1145/2987380},
  doi = {10.1145/2987380},
  journal = {ACM Transactions on Information Systems},
  articleno = {15},
  numpages = {31},
}

istella/main (default config)

Download size: 1.20 GiB
Dataset size: 1.12 GiB
Splits:

Split	Examples
`'test'`	9,799
`'train'`	23,219

Examples (tfds.as_dataframe):

istella/s

Download size: 450.26 MiB
Dataset size: 421.88 MiB
Splits:

Split	Examples
`'test'`	6,562
`'train'`	19,245
`'vali'`	7,211

Examples (tfds.as_dataframe):

istella/x

Download size: 4.42 GiB
Dataset size: 2.46 GiB
Splits:

Split	Examples
`'test'`	2,000
`'train'`	6,000
`'vali'`	2,000

Examples (tfds.as_dataframe):