- Açıklama :
MSLR-WEB, Microsoft Research tarafından yayınlanan iki büyük ölçekli Dereceye Göre Öğrenme veri kümesidir. İlk veri kümesi ("30k" olarak adlandırılır) 30.000 sorgu içerir ve ikinci veri kümesi ("10k" olarak adlandırılır) 10.000 sorgu içerir. Her veri kümesi, özellik vektörleri ve ilgili uygunluk değerlendirme etiketleri olarak temsil edilen sorgu-belge çiftlerinden oluşur.
Veri kümesinin "10k" veya "30k" sürümünün kullanılıp kullanılmayacağını ve buna karşılık gelen bir katlamayı aşağıdaki gibi belirleyebilirsiniz:
ds = tfds.load("mslr_web/30k_fold1")
Yalnızca mslr_web
belirtilirse, varsayılan olarak mslr_web/10k_fold1
seçeneği seçilir:
# This is the same as `tfds.load("mslr_web/10k_fold1")`
ds = tfds.load("mslr_web")
Ana sayfa : https://www.microsoft.com/en-us/research/project/mslr/
Kaynak kodu :
tfds.ranking.mslr_web.MslrWeb
Sürümler :
-
1.0.0
(varsayılan): İlk sürüm.
-
Otomatik önbelleğe alınmış ( belgeler ): Hayır
Özellikler :
FeaturesDict({
'bm25_anchor': Tensor(shape=(None,), dtype=tf.float64),
'bm25_body': Tensor(shape=(None,), dtype=tf.float64),
'bm25_title': Tensor(shape=(None,), dtype=tf.float64),
'bm25_url': Tensor(shape=(None,), dtype=tf.float64),
'bm25_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'boolean_model_anchor': Tensor(shape=(None,), dtype=tf.float64),
'boolean_model_body': Tensor(shape=(None,), dtype=tf.float64),
'boolean_model_title': Tensor(shape=(None,), dtype=tf.float64),
'boolean_model_url': Tensor(shape=(None,), dtype=tf.float64),
'boolean_model_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_number_anchor': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_number_body': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_number_title': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_number_url': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_number_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_ratio_anchor': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_ratio_body': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_ratio_title': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_ratio_url': Tensor(shape=(None,), dtype=tf.float64),
'covered_query_term_ratio_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'idf_anchor': Tensor(shape=(None,), dtype=tf.float64),
'idf_body': Tensor(shape=(None,), dtype=tf.float64),
'idf_title': Tensor(shape=(None,), dtype=tf.float64),
'idf_url': Tensor(shape=(None,), dtype=tf.float64),
'idf_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'inlink_number': Tensor(shape=(None,), dtype=tf.float64),
'label': Tensor(shape=(None,), dtype=tf.float64),
'length_of_url': Tensor(shape=(None,), dtype=tf.float64),
'lmir_abs_anchor': Tensor(shape=(None,), dtype=tf.float64),
'lmir_abs_body': Tensor(shape=(None,), dtype=tf.float64),
'lmir_abs_title': Tensor(shape=(None,), dtype=tf.float64),
'lmir_abs_url': Tensor(shape=(None,), dtype=tf.float64),
'lmir_abs_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'lmir_dir_anchor': Tensor(shape=(None,), dtype=tf.float64),
'lmir_dir_body': Tensor(shape=(None,), dtype=tf.float64),
'lmir_dir_title': Tensor(shape=(None,), dtype=tf.float64),
'lmir_dir_url': Tensor(shape=(None,), dtype=tf.float64),
'lmir_dir_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'lmir_jm_anchor': Tensor(shape=(None,), dtype=tf.float64),
'lmir_jm_body': Tensor(shape=(None,), dtype=tf.float64),
'lmir_jm_title': Tensor(shape=(None,), dtype=tf.float64),
'lmir_jm_url': Tensor(shape=(None,), dtype=tf.float64),
'lmir_jm_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'max_of_stream_length_normalized_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'max_of_stream_length_normalized_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'max_of_stream_length_normalized_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'max_of_stream_length_normalized_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'max_of_stream_length_normalized_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'max_of_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'max_of_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'max_of_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'max_of_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'max_of_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'max_of_tf_idf_anchor': Tensor(shape=(None,), dtype=tf.float64),
'max_of_tf_idf_body': Tensor(shape=(None,), dtype=tf.float64),
'max_of_tf_idf_title': Tensor(shape=(None,), dtype=tf.float64),
'max_of_tf_idf_url': Tensor(shape=(None,), dtype=tf.float64),
'max_of_tf_idf_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_stream_length_normalized_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_stream_length_normalized_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_stream_length_normalized_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_stream_length_normalized_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_stream_length_normalized_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_tf_idf_anchor': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_tf_idf_body': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_tf_idf_title': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_tf_idf_url': Tensor(shape=(None,), dtype=tf.float64),
'mean_of_tf_idf_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'min_of_stream_length_normalized_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'min_of_stream_length_normalized_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'min_of_stream_length_normalized_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'min_of_stream_length_normalized_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'min_of_stream_length_normalized_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'min_of_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'min_of_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'min_of_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'min_of_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'min_of_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'min_of_tf_idf_anchor': Tensor(shape=(None,), dtype=tf.float64),
'min_of_tf_idf_body': Tensor(shape=(None,), dtype=tf.float64),
'min_of_tf_idf_title': Tensor(shape=(None,), dtype=tf.float64),
'min_of_tf_idf_url': Tensor(shape=(None,), dtype=tf.float64),
'min_of_tf_idf_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'number_of_slash_in_url': Tensor(shape=(None,), dtype=tf.float64),
'outlink_number': Tensor(shape=(None,), dtype=tf.float64),
'page_rank': Tensor(shape=(None,), dtype=tf.float64),
'quality_score': Tensor(shape=(None,), dtype=tf.float64),
'quality_score_2': Tensor(shape=(None,), dtype=tf.float64),
'query_url_click_count': Tensor(shape=(None,), dtype=tf.float64),
'site_rank': Tensor(shape=(None,), dtype=tf.float64),
'stream_length_anchor': Tensor(shape=(None,), dtype=tf.float64),
'stream_length_body': Tensor(shape=(None,), dtype=tf.float64),
'stream_length_title': Tensor(shape=(None,), dtype=tf.float64),
'stream_length_url': Tensor(shape=(None,), dtype=tf.float64),
'stream_length_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_stream_length_normalized_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_stream_length_normalized_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_stream_length_normalized_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_stream_length_normalized_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_stream_length_normalized_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_tf_idf_anchor': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_tf_idf_body': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_tf_idf_title': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_tf_idf_url': Tensor(shape=(None,), dtype=tf.float64),
'sum_of_tf_idf_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'url_click_count': Tensor(shape=(None,), dtype=tf.float64),
'url_dwell_time': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_stream_length_normalized_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_stream_length_normalized_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_stream_length_normalized_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_stream_length_normalized_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_stream_length_normalized_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_term_frequency_anchor': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_term_frequency_body': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_term_frequency_title': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_term_frequency_url': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_term_frequency_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_tf_idf_anchor': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_tf_idf_body': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_tf_idf_title': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_tf_idf_url': Tensor(shape=(None,), dtype=tf.float64),
'variance_of_tf_idf_whole_document': Tensor(shape=(None,), dtype=tf.float64),
'vector_space_model_anchor': Tensor(shape=(None,), dtype=tf.float64),
'vector_space_model_body': Tensor(shape=(None,), dtype=tf.float64),
'vector_space_model_title': Tensor(shape=(None,), dtype=tf.float64),
'vector_space_model_url': Tensor(shape=(None,), dtype=tf.float64),
'vector_space_model_whole_document': Tensor(shape=(None,), dtype=tf.float64),
})
Denetimli anahtarlar (bkz
as_supervised
doc ):None
Şekil ( tfds.show_examples ): Desteklenmez.
alıntı :
@article{DBLP:journals/corr/QinL13,
author = {Tao Qin and Tie{-}Yan Liu},
title = {Introducing {LETOR} 4.0 Datasets},
journal = {CoRR},
volume = {abs/1306.2597},
year = {2013},
url = {http://arxiv.org/abs/1306.2597},
timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},
biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},
bibsource = {dblp computer science bibliography, http://dblp.org}
}
mslr_web/10k_fold1 (varsayılan yapılandırma)
İndirme boyutu :
1.15 GiB
Veri kümesi boyutu :
381.58 MiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 2.000 |
'train' | 6.000 |
'vali' | 2.000 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/10k_fold2
İndirme boyutu :
1.15 GiB
Veri kümesi boyutu :
381.58 MiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 2.000 |
'train' | 6.000 |
'vali' | 2.000 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/10k_fold3
İndirme boyutu :
1.15 GiB
Veri kümesi boyutu :
381.58 MiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 2.000 |
'train' | 6.000 |
'vali' | 2.000 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/10k_fold4
İndirme boyutu :
1.15 GiB
Veri kümesi boyutu :
381.58 MiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 2.000 |
'train' | 6.000 |
'vali' | 2.000 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/10k_fold5
İndirme boyutu :
1.15 GiB
Veri kümesi boyutu :
381.58 MiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 2.000 |
'train' | 6.000 |
'vali' | 2.000 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/30k_fold1
İndirme boyutu :
3.59 GiB
Veri kümesi boyutu :
1.17 GiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 6.306 |
'train' | 18.919 |
'vali' | 6.306 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/30k_fold2
İndirme boyutu :
3.59 GiB
Veri kümesi boyutu :
1.17 GiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 6.307 |
'train' | 18.918 |
'vali' | 6.306 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/30k_fold3
İndirme boyutu :
3.59 GiB
Veri kümesi boyutu :
1.17 GiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 6.306 |
'train' | 18.918 |
'vali' | 6.307 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/30k_fold4
İndirme boyutu :
3.59 GiB
Veri kümesi boyutu :
1.17 GiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 6.306 |
'train' | 18.919 |
'vali' | 6.306 |
- Örnekler ( tfds.as_dataframe ):
mslr_web/30k_fold5
İndirme boyutu :
3.59 GiB
Veri kümesi boyutu :
1.17 GiB
Bölmeler :
Bölmek | Örnekler |
---|---|
'test' | 6.306 |
'train' | 18.919 |
'vali' | 6.306 |
- Örnekler ( tfds.as_dataframe ):