- Description:
MSLR-WEB are two large-scale Learning-to-Rank datasets released by Microsoft Research. The first dataset (called "30k") contains 30,000 queries and the second dataset (called "10k") contains 10,000 queries. Each dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.
You can specify whether to use the "10k" or "30k" version of the dataset, and a corresponding fold, as follows:
ds = tfds.load("mslr_web/30k_fold1")
If only mslr_web
is specified, the mslr_web/10k_fold1
option is selected by
default:
# This is the same as `tfds.load("mslr_web/10k_fold1")`
ds = tfds.load("mslr_web")
Homepage: https://www.microsoft.com/en-us/research/project/mslr/
Source code:
tfds.ranking.mslr_web.MslrWeb
Versions:
1.0.0
: Initial release.1.1.0
: Bundle features into a single 'float_features' feature.1.2.0
(default): Add query and document identifiers.
Auto-cached (documentation): No
Feature structure:
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 136), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
doc_id | Tensor | (None,) | int64 | |
float_features | Tensor | (None, 136) | float64 | |
label | Tensor | (None,) | float64 | |
query_id | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{DBLP:journals/corr/QinL13,
author = {Tao Qin and Tie{-}Yan Liu},
title = {Introducing {LETOR} 4.0 Datasets},
journal = {CoRR},
volume = {abs/1306.2597},
year = {2013},
url = {http://arxiv.org/abs/1306.2597},
timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},
biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},
bibsource = {dblp computer science bibliography, http://dblp.org}
}
mslr_web/10k_fold1 (default config)
Download size:
1.15 GiB
Dataset size:
310.08 MiB
Splits:
Split | Examples |
---|---|
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
- Examples (tfds.as_dataframe):
mslr_web/10k_fold2
Download size:
1.15 GiB
Dataset size:
310.08 MiB
Splits:
Split | Examples |
---|---|
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
- Examples (tfds.as_dataframe):
mslr_web/10k_fold3
Download size:
1.15 GiB
Dataset size:
310.08 MiB
Splits:
Split | Examples |
---|---|
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
- Examples (tfds.as_dataframe):
mslr_web/10k_fold4
Download size:
1.15 GiB
Dataset size:
310.08 MiB
Splits:
Split | Examples |
---|---|
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
- Examples (tfds.as_dataframe):
mslr_web/10k_fold5
Download size:
1.15 GiB
Dataset size:
310.08 MiB
Splits:
Split | Examples |
---|---|
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
- Examples (tfds.as_dataframe):
mslr_web/30k_fold1
Download size:
3.59 GiB
Dataset size:
964.09 MiB
Splits:
Split | Examples |
---|---|
'test' |
6,306 |
'train' |
18,919 |
'vali' |
6,306 |
- Examples (tfds.as_dataframe):
mslr_web/30k_fold2
Download size:
3.59 GiB
Dataset size:
964.09 MiB
Splits:
Split | Examples |
---|---|
'test' |
6,307 |
'train' |
18,918 |
'vali' |
6,306 |
- Examples (tfds.as_dataframe):
mslr_web/30k_fold3
Download size:
3.59 GiB
Dataset size:
964.09 MiB
Splits:
Split | Examples |
---|---|
'test' |
6,306 |
'train' |
18,918 |
'vali' |
6,307 |
- Examples (tfds.as_dataframe):
mslr_web/30k_fold4
Download size:
3.59 GiB
Dataset size:
964.09 MiB
Splits:
Split | Examples |
---|---|
'test' |
6,306 |
'train' |
18,919 |
'vali' |
6,306 |
- Examples (tfds.as_dataframe):
mslr_web/30k_fold5
Download size:
3.59 GiB
Dataset size:
964.09 MiB
Splits:
Split | Examples |
---|---|
'test' |
6,306 |
'train' |
18,919 |
'vali' |
6,306 |
- Examples (tfds.as_dataframe):