Criteo Uplift Modeling Dataset

This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)

This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

Data description

This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).


Here is a detailed description of the fields (they are comma-separated in the file):

  • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
  • treatment: treatment group (1 = treated, 0 = control)
  • conversion: whether a conversion occured for this user (binary, label)
  • visit: whether a visit occured for this user (binary, label)
  • exposure: treatment effect, whether the user has been effectively exposed (binary)

Key figures

  • Format: CSV
  • Size: 459MB (compressed)
  • Rows: 25,309,483
  • Average Visit Rate: .04132
  • Average Conversion Rate: .00229
  • Treatment Ratio: .846


The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:

Split Examples
'train' 13,979,592
  • Feature structure:
    'conversion': bool,
    'exposure': bool,
    'f0': float32,
    'f1': float32,
    'f10': float32,
    'f11': float32,
    'f2': float32,
    'f3': float32,
    'f4': float32,
    'f5': float32,
    'f6': float32,
    'f7': float32,
    'f8': float32,
    'f9': float32,
    'treatment': int64,
    'visit': bool,
  • Feature documentation:
  • Citation:
author = { {Diemert Eustache, Betlei Artem} and Renaudin, Christophe and Massih-Reza, Amini},
title={A Large Scale Benchmark for Uplift Modeling},
publisher = {ACM},
booktitle = {Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018},
year = {2018}