cityscapes

  • Description:

Cityscapes is a dataset consisting of diverse urban street scenes across 50 different cities at varying times of the year as well as ground truths for several vision tasks including semantic segmentation, instance level segmentation (TODO), and stereo pair disparity inference.

For segmentation tasks (default split, accessible via 'cityscapes/semantic_segmentation'), Cityscapes provides dense pixel level annotations for 5000 images at 1024 * 2048 resolution pre-split into training (2975), validation (500) and test (1525) sets. Label annotations for segmentation tasks span across 30+ classes commonly encountered during driving scene perception. Detailed label information may be found here: https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/helpers/labels.py#L52-L99

Cityscapes also provides coarse grain segmentation annotations (accessible via 'cityscapes/semantic_segmentation_extra') for 19998 images in a 'train_extra' split which may prove useful for pretraining / data-heavy models.

Besides segmentation, cityscapes also provides stereo image pairs and ground truths for disparity inference tasks on both the normal and extra splits (accessible via 'cityscapes/stereo_disparity' and 'cityscapes/stereo_disparity_extra' respectively).

Ingored examples:

  • For 'cityscapes/stereo_disparity_extra':
    • troisdorf_000000000073{*} images (no disparity map present)
@inproceedings{Cordts2016Cityscapes,
  title={The Cityscapes Dataset for Semantic Urban Scene Understanding},
  author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
  booktitle={Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016}
}

cityscapes/semantic_segmentation (default config)

  • Config description: Cityscapes semantic segmentation dataset.

  • Dataset size: 10.86 GiB

  • Splits:

Split Examples
'test' 1,525
'train' 2,975
'validation' 500
  • Feature structure:
FeaturesDict({
    'image_id': Text(shape=(), dtype=string),
    'image_left': Image(shape=(1024, 2048, 3), dtype=uint8),
    'segmentation_label': Image(shape=(1024, 2048, 1), dtype=uint8),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
image_id Text string
image_left Image (1024, 2048, 3) uint8
segmentation_label Image (1024, 2048, 1) uint8

cityscapes/semantic_segmentation_extra

  • Config description: Cityscapes semantic segmentation dataset with train_extra split and coarse labels.

  • Dataset size: 51.92 GiB

  • Splits:

Split Examples
'train' 2,975
'train_extra' 19,998
'validation' 500
  • Feature structure:
FeaturesDict({
    'image_id': Text(shape=(), dtype=string),
    'image_left': Image(shape=(1024, 2048, 3), dtype=uint8),
    'segmentation_label': Image(shape=(1024, 2048, 1), dtype=uint8),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
image_id Text string
image_left Image (1024, 2048, 3) uint8
segmentation_label Image (1024, 2048, 1) uint8

cityscapes/stereo_disparity

  • Config description: Cityscapes stereo image and disparity maps dataset.

  • Dataset size: 25.03 GiB

  • Splits:

Split Examples
'test' 1,525
'train' 2,975
'validation' 500
  • Feature structure:
FeaturesDict({
    'disparity_map': Image(shape=(1024, 2048, 1), dtype=uint8),
    'image_id': Text(shape=(), dtype=string),
    'image_left': Image(shape=(1024, 2048, 3), dtype=uint8),
    'image_right': Image(shape=(1024, 2048, 3), dtype=uint8),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
disparity_map Image (1024, 2048, 1) uint8
image_id Text string
image_left Image (1024, 2048, 3) uint8
image_right Image (1024, 2048, 3) uint8

cityscapes/stereo_disparity_extra

  • Config description: Cityscapes stereo image and disparity maps dataset with train_extra split.

  • Dataset size: 119.18 GiB

  • Splits:

Split Examples
'train' 2,975
'train_extra' 19,997
'validation' 500
  • Feature structure:
FeaturesDict({
    'disparity_map': Image(shape=(1024, 2048, 1), dtype=uint8),
    'image_id': Text(shape=(), dtype=string),
    'image_left': Image(shape=(1024, 2048, 3), dtype=uint8),
    'image_right': Image(shape=(1024, 2048, 3), dtype=uint8),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
disparity_map Image (1024, 2048, 1) uint8
image_id Text string
image_left Image (1024, 2048, 3) uint8
image_right Image (1024, 2048, 3) uint8