imagenet2012_subset

Imagenet2012Subset is a subset of original ImageNet ILSVRC 2012 dataset. The dataset share the same validation set as the original ImageNet ILSVRC 2012 dataset. However, the training set is subsampled in a label balanced fashion. In 1pct configuration, 1%, or 12811, images are sampled, most classes have the same number of images (average 12.8), some classes randomly have 1 more example than others; and in 10pct configuration, ~10%, or 128116, most classes have the same number of images (average 128), and some classes randomly have 1 more example than others.

This is supposed to be used as a benchmark for semi-supervised learning, and has been originally used in SimCLR paper (https://arxiv.org/abs/2002.05709).

  • Homepage: http://image-net.org/

  • Source code: tfds.datasets.imagenet2012_subset.Builder

  • Versions:

    • 2.0.0: Fix validation labels.
    • 2.0.1: Encoding fix. No changes from user point of view.
    • 3.0.0: Fix colorization on ~12 images (CMYK -> RGB). Fix format for consistency (convert the single png image to Jpeg). Faster generation reading directly from the archive.

    • 4.0.0: (unpublished)

    • 5.0.0 (default): New split API (https://tensorflow.org/datasets/splits)

    • 5.1.0: Added test split.

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    manual_dir should contain two files: ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar. You need to register on https://image-net.org/download-images in order to get the link to download the dataset.

  • Auto-cached (documentation): No

  • Feature structure:

FeaturesDict({
    'file_name': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=1000),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
file_name Text string
image Image (None, None, 3) uint8
label ClassLabel int64
@article{chen2020simple,
  title={A Simple Framework for Contrastive Learning of Visual Representations},
  author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2002.05709},
  year={2020}
}
@article{ILSVRC15,
  Author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
  Title = { {ImageNet Large Scale Visual Recognition Challenge} },
  Year = {2015},
  journal   = {International Journal of Computer Vision (IJCV)},
  doi = {10.1007/s11263-015-0816-y},
  volume={115},
  number={3},
  pages={211-252}
}

imagenet2012_subset/1pct (default config)

  • Config description: 1pct of total ImageNet training set.

  • Download size: 254.22 KiB

  • Dataset size: 7.61 GiB

  • Splits:

Split Examples
'train' 12,811
'validation' 50,000

Visualization

imagenet2012_subset/10pct

  • Config description: 10pct of total ImageNet training set.

  • Download size: 2.48 MiB

  • Dataset size: 19.91 GiB

  • Splits:

Split Examples
'train' 128,116
'validation' 50,000

Visualization