Kitti contains a suite of vision tasks built using an autonomous driving platform. The full benchmark contains many tasks such as stereo, optical flow, visual odometry, etc. This dataset contains the object detection dataset, including the monocular images and bounding boxes. The dataset contains 7481 training images annotated with 3D bounding boxes. A full description of the annotations can be found in the readme of the object development kit readme on the Kitti homepage.

Split Examples
'test' 711
'train' 6,347
'validation' 423
  • Feature structure:
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'image/file_name': Text(shape=(), dtype=tf.string),
    'objects': Sequence({
        'alpha': tf.float32,
        'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
        'dimensions': Tensor(shape=(3,), dtype=tf.float32),
        'location': Tensor(shape=(3,), dtype=tf.float32),
        'occluded': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
        'rotation_y': tf.float32,
        'truncated': tf.float32,
        'type': ClassLabel(shape=(), dtype=tf.int64, num_classes=8),
  • Feature documentation:
Feature Class Shape Dtype Description
image Image (None, None, 3) tf.uint8
image/file_name Text tf.string
objects Sequence
objects/alpha Tensor tf.float32
objects/bbox BBoxFeature (4,) tf.float32
objects/dimensions Tensor (3,) tf.float32
objects/location Tensor (3,) tf.float32
objects/occluded ClassLabel tf.int64
objects/rotation_y Tensor tf.float32
objects/truncated Tensor tf.float32
objects/type ClassLabel tf.int64


  • Citation:
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2012}