cifar10_1

The CIFAR-10.1 dataset is a new test set for CIFAR-10. CIFAR-10.1 contains roughly 2,000 new test images that were sampled after multiple years of research on the original CIFAR-10 dataset. The data collection for CIFAR-10.1 was designed to minimize distribution shift relative to the original dataset. We describe the creation of CIFAR-10.1 in the paper "Do CIFAR-10 Classifiers Generalize to CIFAR-10?". The images in CIFAR-10.1 are a subset of the TinyImages dataset. There are currently two versions of the CIFAR-10.1 dataset: v4 and v6.

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
image Image (32, 32, 3) uint8
label ClassLabel int64
@article{recht2018cifar10.1,
  author = {Benjamin Recht and Rebecca Roelofs and Ludwig Schmidt and Vaishaal Shankar},
  title = {Do CIFAR-10 Classifiers Generalize to CIFAR-10?},
  year = {2018},
  note = {\url{https://arxiv.org/abs/1806.00451} },
}

@article{torralba2008tinyimages,
  author = {Antonio Torralba and Rob Fergus and William T. Freeman},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  title = {80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition},
  year = {2008},
  volume = {30},
  number = {11},
  pages = {1958-1970}
}

cifar10_1/v4 (default config)

  • Config description: It is the first version of our dataset on which we tested any classifier. As mentioned above, this makes the v4 dataset independent of the classifiers we evaluate. The numbers reported in the main sections of our paper use this version of the dataset. It was built from the top 25 TinyImages keywords for each class, which led to a slight class imbalance. The largest difference is that ships make up only 8% of the test set instead of 10%. v4 contains 2,021 images.

  • Download size: 5.93 MiB

  • Dataset size: 4.46 MiB

  • Splits:

Split Examples
'test' 2,021

Visualization

cifar10_1/v6

  • Config description: It is derived from a slightly improved keyword allocation that is exactly class balanced. This version of the dataset corresponds to the results in Appendix D of our paper. v6 contains 2,000 images.

  • Download size: 5.87 MiB

  • Dataset size: 4.40 MiB

  • Splits:

Split Examples
'test' 2,000

Visualization