The CIFAR-10.1 dataset is a new test set for CIFAR-10. CIFAR-10.1 contains roughly 2,000 new test images that were sampled after multiple years of research on the original CIFAR-10 dataset. The data collection for CIFAR-10.1 was designed to minimize distribution shift relative to the original dataset. We describe the creation of CIFAR-10.1 in the paper "Do CIFAR-10 Classifiers Generalize to CIFAR-10?". The images in CIFAR-10.1 are a subset of the TinyImages dataset. There are currently two versions of the CIFAR-10.1 dataset: v4 and v6.

    'image': Image(shape=(32, 32, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
image Image (32, 32, 3) uint8
label ClassLabel int64
cifar10_1/v4 (default config)

  • Config description: It is the first version of our dataset on which we tested any classifier. As mentioned above, this makes the v4 dataset independent of the classifiers we evaluate. The numbers reported in the main sections of our paper use this version of the dataset. It was built from the top 25 TinyImages keywords for each class, which led to a slight class imbalance. The largest difference is that ships make up only 8% of the test set instead of 10%. v4 contains 2,021 images.

  • Download size: 5.93 MiB

  • Dataset size: 4.46 MiB

  • Splits:

  • Config description: It is derived from a slightly improved keyword allocation that is exactly class balanced. This version of the dataset corresponds to the results in Appendix D of our paper. v6 contains 2,000 images.

  • Download size: 5.87 MiB

  • Dataset size: 4.40 MiB

  • Splits:

