Apply to speak at TensorFlow World. Deadline April 23rd. Propose talk

Datasets

Usage

# See all registered datasets
tfds.list_builders()

# Load a given dataset by name, along with the DatasetInfo
data, info = tfds.load("mnist", with_info=True)
train_data, test_data = data['train'], data['test']
assert isinstance(train_data, tf.data.Dataset)
assert info.features['label'].num_classes == 10
assert info.splits['train'].num_examples == 60000

# You can also access a builder directly
builder = tfds.builder("mnist")
assert builder.info.splits['train'].num_examples == 60000
builder.download_and_prepare()
datasets = builder.as_dataset()

# If you need NumPy arrays
np_datasets = tfds.as_numpy(datasets)

All Datasets


audio

"nsynth"

The NSynth Dataset is an audio dataset containing ~300k musical notes, each with a unique pitch, timbre, and envelope. Each note is annotated with three additional pieces of information based on a combination of human evaluation and heuristic algorithms: -Source: The method of sound production for the note's instrument. -Family: The high-level family of which the note's instrument is a member. -Qualities: Sonic qualities of the note.

The dataset is split into train, valid, and test sets, with no instruments overlapping between the train set and the valid/test sets.

Features

FeaturesDict({
    'audio': Tensor(shape=(64000,), dtype=tf.float32),
    'id': Tensor(shape=(), dtype=tf.string),
    'instrument': FeaturesDict({
        'family': ClassLabel(shape=(), dtype=tf.int64, num_classes=11),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1006),
        'source': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    }),
    'pitch': ClassLabel(shape=(), dtype=tf.int64, num_classes=128),
    'qualities': FeaturesDict({
        'bright': Tensor(shape=(), dtype=tf.bool),
        'dark': Tensor(shape=(), dtype=tf.bool),
        'distortion': Tensor(shape=(), dtype=tf.bool),
        'fast_decay': Tensor(shape=(), dtype=tf.bool),
        'long_release': Tensor(shape=(), dtype=tf.bool),
        'multiphonic': Tensor(shape=(), dtype=tf.bool),
        'nonlinear_env': Tensor(shape=(), dtype=tf.bool),
        'percussive': Tensor(shape=(), dtype=tf.bool),
        'reverb': Tensor(shape=(), dtype=tf.bool),
        'tempo-synced': Tensor(shape=(), dtype=tf.bool),
    }),
    'velocity': ClassLabel(shape=(), dtype=tf.int64, num_classes=128),
})

Statistics

Split Examples
ALL 305,979
TRAIN 289,205
VALID 12,678
TEST 4,096

Urls

Supervised keys (for as_supervised=True)

None

Citation

@InProceedings{pmlr-v70-engel17a,
  title =    {Neural Audio Synthesis of Musical Notes with {W}ave{N}et Autoencoders},
  author =   {Jesse Engel and Cinjon Resnick and Adam Roberts and Sander Dieleman and Mohammad Norouzi and Douglas Eck and Karen Simonyan},
  booktitle =    {Proceedings of the 34th International Conference on Machine Learning},
  pages =    {1068--1077},
  year =     {2017},
  editor =   {Doina Precup and Yee Whye Teh},
  volume =   {70},
  series =   {Proceedings of Machine Learning Research},
  address =      {International Convention Centre, Sydney, Australia},
  month =    {06--11 Aug},
  publisher =    {PMLR},
  pdf =      {http://proceedings.mlr.press/v70/engel17a/engel17a.pdf},
  url =      {http://proceedings.mlr.press/v70/engel17a.html},
}

image

"caltech101"

Caltech-101 consists of pictures of objects belonging to 101 classes, plus one background clutter class. Each image is labelled with a single object. Each class contains roughly 40 to 800 images, totalling around 9k images. Images are of variable sizes, with typical edge lengths of 200-300 pixels. This version contains image-level labels only. The original dataset also contains bounding boxes.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'image/file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=102),
})

Statistics

Split Examples
TRAIN 9,144
ALL 9,144

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{FeiFei2004LearningGV,
  title={Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories},
  author={Li Fei-Fei and Rob Fergus and Pietro Perona},
  journal={Computer Vision and Pattern Recognition Workshop},
  year={2004},
}

"cats_vs_dogs"

A large set of images of cats and dogs.There are 1738 corrupted images that are dropped.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

Statistics

Split Examples
TRAIN 23,262
ALL 23,262

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@Inproceedings (Conference){asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization,
author = {Elson, Jeremy and Douceur, John (JD) and Howell, Jon and Saul, Jared},
title = {Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization},
booktitle = {Proceedings of 14th ACM Conference on Computer and Communications Security (CCS)},
year = {2007},
month = {October},
publisher = {Association for Computing Machinery, Inc.},
url = {https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/},
edition = {Proceedings of 14th ACM Conference on Computer and Communications Security (CCS)},
}

"celeb_a"

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image.

The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization.

Features

FeaturesDict({
    'attributes': FeaturesDict({
        '5_o_Clock_Shadow': Tensor(shape=(), dtype=tf.bool),
        'Arched_Eyebrows': Tensor(shape=(), dtype=tf.bool),
        'Attractive': Tensor(shape=(), dtype=tf.bool),
        'Bags_Under_Eyes': Tensor(shape=(), dtype=tf.bool),
        'Bald': Tensor(shape=(), dtype=tf.bool),
        'Bangs': Tensor(shape=(), dtype=tf.bool),
        'Big_Lips': Tensor(shape=(), dtype=tf.bool),
        'Big_Nose': Tensor(shape=(), dtype=tf.bool),
        'Black_Hair': Tensor(shape=(), dtype=tf.bool),
        'Blond_Hair': Tensor(shape=(), dtype=tf.bool),
        'Blurry': Tensor(shape=(), dtype=tf.bool),
        'Brown_Hair': Tensor(shape=(), dtype=tf.bool),
        'Bushy_Eyebrows': Tensor(shape=(), dtype=tf.bool),
        'Chubby': Tensor(shape=(), dtype=tf.bool),
        'Double_Chin': Tensor(shape=(), dtype=tf.bool),
        'Eyeglasses': Tensor(shape=(), dtype=tf.bool),
        'Goatee': Tensor(shape=(), dtype=tf.bool),
        'Gray_Hair': Tensor(shape=(), dtype=tf.bool),
        'Heavy_Makeup': Tensor(shape=(), dtype=tf.bool),
        'High_Cheekbones': Tensor(shape=(), dtype=tf.bool),
        'Male': Tensor(shape=(), dtype=tf.bool),
        'Mouth_Slightly_Open': Tensor(shape=(), dtype=tf.bool),
        'Mustache': Tensor(shape=(), dtype=tf.bool),
        'Narrow_Eyes': Tensor(shape=(), dtype=tf.bool),
        'No_Beard': Tensor(shape=(), dtype=tf.bool),
        'Oval_Face': Tensor(shape=(), dtype=tf.bool),
        'Pale_Skin': Tensor(shape=(), dtype=tf.bool),
        'Pointy_Nose': Tensor(shape=(), dtype=tf.bool),
        'Receding_Hairline': Tensor(shape=(), dtype=tf.bool),
        'Rosy_Cheeks': Tensor(shape=(), dtype=tf.bool),
        'Sideburns': Tensor(shape=(), dtype=tf.bool),
        'Smiling': Tensor(shape=(), dtype=tf.bool),
        'Straight_Hair': Tensor(shape=(), dtype=tf.bool),
        'Wavy_Hair': Tensor(shape=(), dtype=tf.bool),
        'Wearing_Earrings': Tensor(shape=(), dtype=tf.bool),
        'Wearing_Hat': Tensor(shape=(), dtype=tf.bool),
        'Wearing_Lipstick': Tensor(shape=(), dtype=tf.bool),
        'Wearing_Necklace': Tensor(shape=(), dtype=tf.bool),
        'Wearing_Necktie': Tensor(shape=(), dtype=tf.bool),
        'Young': Tensor(shape=(), dtype=tf.bool),
    }),
    'image': Image(shape=(218, 178, 3), dtype=tf.uint8),
    'landmarks': FeaturesDict({
        'lefteye_x': Tensor(shape=(), dtype=tf.int64),
        'lefteye_y': Tensor(shape=(), dtype=tf.int64),
        'leftmouth_x': Tensor(shape=(), dtype=tf.int64),
        'leftmouth_y': Tensor(shape=(), dtype=tf.int64),
        'nose_x': Tensor(shape=(), dtype=tf.int64),
        'nose_y': Tensor(shape=(), dtype=tf.int64),
        'righteye_x': Tensor(shape=(), dtype=tf.int64),
        'righteye_y': Tensor(shape=(), dtype=tf.int64),
        'rightmouth_x': Tensor(shape=(), dtype=tf.int64),
        'rightmouth_y': Tensor(shape=(), dtype=tf.int64),
    }),
})

Statistics

Split Examples
ALL 202,599
TRAIN 162,770
TEST 19,962
VALIDATION 19,867

Urls

Supervised keys (for as_supervised=True)

None

Citation

@inproceedings{conf/iccv/LiuLWT15,
  added-at = {2018-10-09T00:00:00.000+0200},
  author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
  biburl = {https://www.bibsonomy.org/bibtex/250e4959be61db325d2f02c1d8cd7bfbb/dblp},
  booktitle = {ICCV},
  crossref = {conf/iccv/2015},
  ee = {http://doi.ieeecomputersociety.org/10.1109/ICCV.2015.425},
  interhash = {3f735aaa11957e73914bbe2ca9d5e702},
  intrahash = {50e4959be61db325d2f02c1d8cd7bfbb},
  isbn = {978-1-4673-8391-2},
  keywords = {dblp},
  pages = {3730-3738},
  publisher = {IEEE Computer Society},
  timestamp = {2018-10-11T11:43:28.000+0200},
  title = {Deep Learning Face Attributes in the Wild.},
  url = {http://dblp.uni-trier.de/db/conf/iccv/iccv2015.html#LiuLWT15},
  year = 2015
}

"celeb_a_hq"

High-quality version of the CELEBA dataset, consisting of 30000 images in 1024 x 1024 resolution.

WARNING: This dataset currently requires you to prepare images on your own.

celeb_a_hq is configured with tfds.image.celebahq.CelebaHQConfig and has the following configurations predefined (defaults to the first one):

  • "1024" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 1024 x 1024 resolution

  • "512" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 512 x 512 resolution

  • "256" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 256 x 256 resolution

  • "128" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 128 x 128 resolution

  • "64" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 64 x 64 resolution

  • "32" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 32 x 32 resolution

  • "16" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 16 x 16 resolution

  • "8" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 8 x 8 resolution

  • "4" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 4 x 4 resolution

  • "2" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 2 x 2 resolution

  • "1" (v0.1.0) (Size: ?? GiB): CelebaHQ images in 1 x 1 resolution

"celeb_a_hq/1024"

FeaturesDict({
    'image': Image(shape=(1024, 1024, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/512"

FeaturesDict({
    'image': Image(shape=(512, 512, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/256"

FeaturesDict({
    'image': Image(shape=(256, 256, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/128"

FeaturesDict({
    'image': Image(shape=(128, 128, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/64"

FeaturesDict({
    'image': Image(shape=(64, 64, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/32"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/16"

FeaturesDict({
    'image': Image(shape=(16, 16, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/8"

FeaturesDict({
    'image': Image(shape=(8, 8, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/4"

FeaturesDict({
    'image': Image(shape=(4, 4, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/2"

FeaturesDict({
    'image': Image(shape=(2, 2, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

"celeb_a_hq/1"

FeaturesDict({
    'image': Image(shape=(1, 1, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
TRAIN 30,000
ALL 30,000

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{DBLP:journals/corr/abs-1710-10196,
  author    = {Tero Karras and
               Timo Aila and
               Samuli Laine and
               Jaakko Lehtinen},
  title     = {Progressive Growing of GANs for Improved Quality, Stability, and Variation},
  journal   = {CoRR},
  volume    = {abs/1710.10196},
  year      = {2017},
  url       = {http://arxiv.org/abs/1710.10196},
  archivePrefix = {arXiv},
  eprint    = {1710.10196},
  timestamp = {Mon, 13 Aug 2018 16:46:42 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1710-10196},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"cifar10"

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

Features

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
ALL 60,000
TRAIN 50,000
TEST 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@TECHREPORT{Krizhevsky09learningmultiple,
    author = {Alex Krizhevsky},
    title = {Learning multiple layers of features from tiny images},
    institution = {},
    year = {2009}
}

"cifar100"

This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).

Features

FeaturesDict({
    'coarse_label': ClassLabel(shape=(), dtype=tf.int64, num_classes=20),
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=100),
})

Statistics

Split Examples
ALL 60,000
TRAIN 50,000
TEST 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@TECHREPORT{Krizhevsky09learningmultiple,
    author = {Alex Krizhevsky},
    title = {Learning multiple layers of features from tiny images},
    institution = {},
    year = {2009}
}

"cifar10_corrupted"

Cifar10Corrupted is a dataset generated by adding 15 common corruptions to the test images in the Cifar10 dataset. This dataset wraps the corrupted Cifar10 test images uploaded by the original authors.

cifar10_corrupted is configured with tfds.image.cifar10_corrupted.Cifar10CorruptedConfig and has the following configurations predefined (defaults to the first one):

  • "brightness_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: brightness, severity level: 1

  • "brightness_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: brightness, severity level: 2

  • "brightness_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: brightness, severity level: 3

  • "brightness_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: brightness, severity level: 4

  • "brightness_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: brightness, severity level: 5

  • "contrast_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: contrast, severity level: 1

  • "contrast_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: contrast, severity level: 2

  • "contrast_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: contrast, severity level: 3

  • "contrast_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: contrast, severity level: 4

  • "contrast_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: contrast, severity level: 5

  • "defocus_blur_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: defocus_blur, severity level: 1

  • "defocus_blur_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: defocus_blur, severity level: 2

  • "defocus_blur_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: defocus_blur, severity level: 3

  • "defocus_blur_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: defocus_blur, severity level: 4

  • "defocus_blur_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: defocus_blur, severity level: 5

  • "elastic_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: elastic, severity level: 1

  • "elastic_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: elastic, severity level: 2

  • "elastic_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: elastic, severity level: 3

  • "elastic_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: elastic, severity level: 4

  • "elastic_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: elastic, severity level: 5

  • "fog_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: fog, severity level: 1

  • "fog_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: fog, severity level: 2

  • "fog_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: fog, severity level: 3

  • "fog_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: fog, severity level: 4

  • "fog_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: fog, severity level: 5

  • "frost_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: frost, severity level: 1

  • "frost_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: frost, severity level: 2

  • "frost_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: frost, severity level: 3

  • "frost_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: frost, severity level: 4

  • "frost_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: frost, severity level: 5

  • "frosted_glass_blur_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: frosted_glass_blur, severity level: 1

  • "frosted_glass_blur_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: frosted_glass_blur, severity level: 2

  • "frosted_glass_blur_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: frosted_glass_blur, severity level: 3

  • "frosted_glass_blur_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: frosted_glass_blur, severity level: 4

  • "frosted_glass_blur_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: frosted_glass_blur, severity level: 5

  • "gaussian_noise_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: gaussian_noise, severity level: 1

  • "gaussian_noise_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: gaussian_noise, severity level: 2

  • "gaussian_noise_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: gaussian_noise, severity level: 3

  • "gaussian_noise_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: gaussian_noise, severity level: 4

  • "gaussian_noise_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: gaussian_noise, severity level: 5

  • "impulse_noise_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: impulse_noise, severity level: 1

  • "impulse_noise_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: impulse_noise, severity level: 2

  • "impulse_noise_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: impulse_noise, severity level: 3

  • "impulse_noise_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: impulse_noise, severity level: 4

  • "impulse_noise_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: impulse_noise, severity level: 5

  • "jpeg_compression_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: jpeg_compression, severity level: 1

  • "jpeg_compression_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: jpeg_compression, severity level: 2

  • "jpeg_compression_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: jpeg_compression, severity level: 3

  • "jpeg_compression_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: jpeg_compression, severity level: 4

  • "jpeg_compression_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: jpeg_compression, severity level: 5

  • "motion_blur_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: motion_blur, severity level: 1

  • "motion_blur_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: motion_blur, severity level: 2

  • "motion_blur_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: motion_blur, severity level: 3

  • "motion_blur_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: motion_blur, severity level: 4

  • "motion_blur_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: motion_blur, severity level: 5

  • "pixelate_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: pixelate, severity level: 1

  • "pixelate_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: pixelate, severity level: 2

  • "pixelate_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: pixelate, severity level: 3

  • "pixelate_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: pixelate, severity level: 4

  • "pixelate_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: pixelate, severity level: 5

  • "shot_noise_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: shot_noise, severity level: 1

  • "shot_noise_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: shot_noise, severity level: 2

  • "shot_noise_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: shot_noise, severity level: 3

  • "shot_noise_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: shot_noise, severity level: 4

  • "shot_noise_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: shot_noise, severity level: 5

  • "snow_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: snow, severity level: 1

  • "snow_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: snow, severity level: 2

  • "snow_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: snow, severity level: 3

  • "snow_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: snow, severity level: 4

  • "snow_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: snow, severity level: 5

  • "zoom_blur_1" (v0.0.1) (Size: 2.72 GiB): Corruption method: zoom_blur, severity level: 1

  • "zoom_blur_2" (v0.0.1) (Size: 2.72 GiB): Corruption method: zoom_blur, severity level: 2

  • "zoom_blur_3" (v0.0.1) (Size: 2.72 GiB): Corruption method: zoom_blur, severity level: 3

  • "zoom_blur_4" (v0.0.1) (Size: 2.72 GiB): Corruption method: zoom_blur, severity level: 4

  • "zoom_blur_5" (v0.0.1) (Size: 2.72 GiB): Corruption method: zoom_blur, severity level: 5

"cifar10_corrupted/brightness_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/brightness_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/brightness_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/brightness_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/brightness_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/contrast_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/contrast_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/contrast_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/contrast_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/contrast_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/defocus_blur_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/defocus_blur_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/defocus_blur_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/defocus_blur_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/defocus_blur_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/elastic_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/elastic_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/elastic_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/elastic_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/elastic_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/fog_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/fog_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/fog_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/fog_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/fog_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frost_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frost_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frost_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frost_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frost_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frosted_glass_blur_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frosted_glass_blur_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frosted_glass_blur_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frosted_glass_blur_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/frosted_glass_blur_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/gaussian_noise_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/gaussian_noise_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/gaussian_noise_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/gaussian_noise_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/gaussian_noise_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/impulse_noise_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/impulse_noise_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/impulse_noise_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/impulse_noise_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/impulse_noise_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/jpeg_compression_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/jpeg_compression_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/jpeg_compression_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/jpeg_compression_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/jpeg_compression_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/motion_blur_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/motion_blur_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/motion_blur_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/motion_blur_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/motion_blur_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/pixelate_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/pixelate_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/pixelate_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/pixelate_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/pixelate_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/shot_noise_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/shot_noise_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/shot_noise_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/shot_noise_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/shot_noise_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/snow_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/snow_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/snow_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/snow_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/snow_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/zoom_blur_1"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/zoom_blur_2"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/zoom_blur_3"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/zoom_blur_4"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"cifar10_corrupted/zoom_blur_5"

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
TEST 10,000
ALL 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@inproceedings{
  hendrycks2018benchmarking,
  title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},
  author={Dan Hendrycks and Thomas Dietterich},
  booktitle={International Conference on Learning Representations},
  year={2019},
  url={https://openreview.net/forum?id=HJz6tiCqYm},
}

"coco2014"

COCO is a large-scale object detection, segmentation, and captioning dataset. This version contains images, bounding boxes and labels for the 2014 version. Note: * Some images from the train and validation sets don't have annotations. * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only had 80 classes.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
    'objects': SequenceDict({
        'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
        'is_crowd': Tensor(shape=(), dtype=tf.bool),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=80),
    }),
})

Statistics

Split Examples
ALL 245,496
TRAIN 82,783
TEST2015 81,434
TEST 40,775
VALIDATION 40,504

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{DBLP:journals/corr/LinMBHPRDZ14,
  author    = {Tsung{-}Yi Lin and
               Michael Maire and
               Serge J. Belongie and
               Lubomir D. Bourdev and
               Ross B. Girshick and
               James Hays and
               Pietro Perona and
               Deva Ramanan and
               Piotr Doll{'{a}}r and
               C. Lawrence Zitnick},
  title     = {Microsoft {COCO:} Common Objects in Context},
  journal   = {CoRR},
  volume    = {abs/1405.0312},
  year      = {2014},
  url       = {http://arxiv.org/abs/1405.0312},
  archivePrefix = {arXiv},
  eprint    = {1405.0312},
  timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"colorectal_histology"

Classification of textures in colorectal cancer histology. Each example is a 150 x 150 x 3 RGB image of one of 8 classes.

Features

FeaturesDict({
    'filename': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(150, 150, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=8),
})

Statistics

Split Examples
TRAIN 5,000
ALL 5,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{kather2016multi,
  title={Multi-class texture analysis in colorectal cancer histology},
  author={Kather, Jakob Nikolas and Weis, Cleo-Aron and Bianconi, Francesco and Melchers, Susanne M and Schad, Lothar R and Gaiser, Timo and Marx, Alexander and Z{"o}llner, Frank Gerrit},
  journal={Scientific reports},
  volume={6},
  pages={27988},
  year={2016},
  publisher={Nature Publishing Group}
}

"colorectal_histology_large"

10 large 5000 x 5000 textured colorectal cancer histology images

Features

FeaturesDict({
    'filename': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(5000, 5000, 3), dtype=tf.uint8),
})

Statistics

Split Examples
TEST 10
ALL 10

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{kather2016multi,
  title={Multi-class texture analysis in colorectal cancer histology},
  author={Kather, Jakob Nikolas and Weis, Cleo-Aron and Bianconi, Francesco and Melchers, Susanne M and Schad, Lothar R and Gaiser, Timo and Marx, Alexander and Z{"o}llner, Frank Gerrit},
  journal={Scientific reports},
  volume={6},
  pages={27988},
  year={2016},
  publisher={Nature Publishing Group}
}

"cycle_gan"

Dataset with images from 2 classes (see config name for information on the specific class)

cycle_gan is configured with tfds.image.cycle_gan.CycleGANConfig and has the following configurations predefined (defaults to the first one):

  • "apple2orange" (v0.1.0) (Size: 74.82 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "summer2winter_yosemite" (v0.1.0) (Size: 126.50 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "horse2zebra" (v0.1.0) (Size: 111.45 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "monet2photo" (v0.1.0) (Size: 291.09 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "cezanne2photo" (v0.1.0) (Size: 266.92 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "ukiyoe2photo" (v0.1.0) (Size: 279.38 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "vangogh2photo" (v0.1.0) (Size: 292.39 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "maps" (v0.1.0) (Size: 1.38 GiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "cityscapes" (v0.1.0) (Size: 266.65 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "facades" (v0.1.0) (Size: 33.51 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

  • "iphone2dslr_flower" (v0.1.0) (Size: 324.22 MiB): A dataset consisting of images from two classes: A and B for example: horses and zebras.

"cycle_gan/apple2orange"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/summer2winter_yosemite"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/horse2zebra"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/monet2photo"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/cezanne2photo"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/ukiyoe2photo"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/vangogh2photo"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/maps"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/cityscapes"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/facades"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

"cycle_gan/iphone2dslr_flower"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

Statistics

Split Examples
ALL 6,186
TRAINB 3,325
TRAINA 1,812
TESTA 569
TESTB 480

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')


"diabetic_retinopathy_detection"

A large set of high-resolution retina images taken under a variety of imaging conditions.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=5),
    'name': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 88,712
TEST 53,576
TRAIN 35,126
SAMPLE 10

Urls

Supervised keys (for as_supervised=True)

None

Citation

@ONLINE {kaggle-diabetic-retinopathy,
    author = "Kaggle and EyePacs",
    title  = "Kaggle Diabetic Retinopathy Detection",
    month  = "jul",
    year   = "2015",
    url    = "https://www.kaggle.com/c/diabetic-retinopathy-detection/data"
}

"dsprites"

dSprites is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite.

All possible combinations of these latents are present exactly once, generating N = 737280 total images.

Latent factor values

  • Color: white
  • Shape: square, ellipse, heart
  • Scale: 6 values linearly spaced in [0.5, 1]
  • Orientation: 40 values in [0, 2 pi]
  • Position X: 32 values in [0, 1]
  • Position Y: 32 values in [0, 1]

We varied one latent at a time (starting from Position Y, then Position X, etc), and sequentially stored the images in fixed order. Hence the order along the first dimension is fixed and allows you to map back to the value of the latents corresponding to that image.

We chose the latents values deliberately to have the smallest step changes while ensuring that all pixel outputs were different. No noise was added.

Features

FeaturesDict({
    'image': Image(shape=(64, 64, 1), dtype=tf.uint8),
    'label_orientation': ClassLabel(shape=(), dtype=tf.int64, num_classes=40),
    'label_scale': ClassLabel(shape=(), dtype=tf.int64, num_classes=6),
    'label_shape': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'label_x_position': ClassLabel(shape=(), dtype=tf.int64, num_classes=32),
    'label_y_position': ClassLabel(shape=(), dtype=tf.int64, num_classes=32),
    'value_orientation': Tensor(shape=[], dtype=tf.float32),
    'value_scale': Tensor(shape=[], dtype=tf.float32),
    'value_shape': Tensor(shape=[], dtype=tf.float32),
    'value_x_position': Tensor(shape=[], dtype=tf.float32),
    'value_y_position': Tensor(shape=[], dtype=tf.float32),
})

Statistics

Split Examples
TRAIN 737,280
ALL 737,280

Urls

Supervised keys (for as_supervised=True)

None

Citation

@misc{dsprites17,
author = {Loic Matthey and Irina Higgins and Demis Hassabis and Alexander Lerchner},
title = {dSprites: Disentanglement testing Sprites dataset},
howpublished= {https://github.com/deepmind/dsprites-dataset/},
year = "2017",
}

"dtd"

The Describable Textures Dataset (DTD) is an evolving collection of textural images in the wild, annotated with a series of human-centric attributes, inspired by the perceptual properties of textures. This data is made available to the computer vision community for research purposes.

The "label" of each example is its "key attribute" (see the official website). The official release of the dataset defines a 10-fold cross-validation partition. Our TRAIN/TEST/VALIDATION splits are those of the first fold.

Features

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=47),
})

Statistics

Split Examples
ALL 5,640
VALIDATION 1,880
TRAIN 1,880
TEST 1,880

Urls

Supervised keys (for as_supervised=True)

None

Citation

@InProceedings{cimpoi14describing,
Author    = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and A. Vedaldi},
Title     = {Describing Textures in the Wild},
Booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},
Year      = {2014}}

"emnist"

The EMNIST dataset is a set of handwritten character digitsderived from the NIST Special Database 19 and converted toa 28x28 pixel image format and dataset structure that directlymatches the MNIST dataset.

emnist is configured with tfds.image.mnist.EMNISTConfig and has the following configurations predefined (defaults to the first one):

  • "byclass" (v1.0.1) (Size: 535.73 MiB): EMNIST ByClass

  • "bymerge" (v1.0.1) (Size: 535.73 MiB): EMNIST ByMerge

  • "balanced" (v1.0.1) (Size: 535.73 MiB): EMNIST Balanced

  • "letters" (v1.0.1) (Size: 535.73 MiB): EMNIST Letters

  • "digits" (v1.0.1) (Size: 535.73 MiB): EMNIST Digits

  • "mnist" (v1.0.1) (Size: 535.73 MiB): EMNIST MNIST

"emnist/byclass"

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=62),
})

"emnist/bymerge"

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=47),
})

"emnist/balanced"

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=47),
})

"emnist/letters"

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=37),
})

"emnist/digits"

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

"emnist/mnist"

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
ALL 70,000
TRAIN 60,000
TEST 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{cohen_afshar_tapson_schaik_2017, 
    title={EMNIST: Extending MNIST to handwritten letters}, 
    DOI={10.1109/ijcnn.2017.7966217}, 
    journal={2017 International Joint Conference on Neural Networks (IJCNN)}, 
    author={Cohen, Gregory and Afshar, Saeed and Tapson, Jonathan and Schaik, Andre Van}, 
    year={2017}
}

"fashion_mnist"

Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Features

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
ALL 70,000
TRAIN 60,000
TEST 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{DBLP:journals/corr/abs-1708-07747,
  author    = {Han Xiao and
               Kashif Rasul and
               Roland Vollgraf},
  title     = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning
               Algorithms},
  journal   = {CoRR},
  volume    = {abs/1708.07747},
  year      = {2017},
  url       = {http://arxiv.org/abs/1708.07747},
  archivePrefix = {arXiv},
  eprint    = {1708.07747},
  timestamp = {Mon, 13 Aug 2018 16:47:27 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1708-07747},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"horses_or_humans"

A large set of images of horses and humans.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

Statistics

Split Examples
ALL 1,283
TRAIN 1,027
TEST 256

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@ONLINE {horses_or_humans,
author = "Laurence Moroney",
title = "Horses or Humans Dataset",
month = "feb",
year = "2019",
url = "http://laurencemoroney.com/horses-or-humans-dataset"
}

"image_label_folder"

Generic image classification dataset.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=None),
})

Statistics

None computed

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')


"imagenet2012"

ILSVRC 2012, aka ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.

Features

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

Statistics

Split Examples
ALL 1,331,167
TRAIN 1,281,167
VALIDATION 50,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{ILSVRC15,
Author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
Title = { {ImageNet Large Scale Visual Recognition Challenge}},
Year = {2015},
journal   = {International Journal of Computer Vision (IJCV)},
doi = {10.1007/s11263-015-0816-y},
volume={115},
number={3},
pages={211-252}
}

"imagenet2012_corrupted"

Imagenet2012Corrupted is a dataset generated by adding common corruptions to the validation images in the ImageNet dataset. In the original paper, there are 15 different corruptions, and each has 5 levels of severity. In this dataset, we implement 12 out of the 15 corruptions, including Gaussian noise, shot noise, impulse_noise, defocus blur, frosted glass blur, zoom blur, fog, brightness, contrast, elastic, pixelate, and jpeg compression. The randomness is fixed so that regeneration is deterministic.

imagenet2012_corrupted is configured with tfds.image.imagenet2012_corrupted.Imagenet2012CorruptedConfig and has the following configurations predefined (defaults to the first one):

  • "gaussian_noise_1" (v0.0.1) (Size: ?? GiB): corruption type = gaussian_noise, severity = 1

  • "gaussian_noise_2" (v0.0.1) (Size: ?? GiB): corruption type = gaussian_noise, severity = 2

  • "gaussian_noise_3" (v0.0.1) (Size: ?? GiB): corruption type = gaussian_noise, severity = 3

  • "gaussian_noise_4" (v0.0.1) (Size: ?? GiB): corruption type = gaussian_noise, severity = 4

  • "gaussian_noise_5" (v0.0.1) (Size: ?? GiB): corruption type = gaussian_noise, severity = 5

  • "shot_noise_1" (v0.0.1) (Size: ?? GiB): corruption type = shot_noise, severity = 1

  • "shot_noise_2" (v0.0.1) (Size: ?? GiB): corruption type = shot_noise, severity = 2

  • "shot_noise_3" (v0.0.1) (Size: ?? GiB): corruption type = shot_noise, severity = 3

  • "shot_noise_4" (v0.0.1) (Size: ?? GiB): corruption type = shot_noise, severity = 4

  • "shot_noise_5" (v0.0.1) (Size: ?? GiB): corruption type = shot_noise, severity = 5

  • "impulse_noise_1" (v0.0.1) (Size: ?? GiB): corruption type = impulse_noise, severity = 1

  • "impulse_noise_2" (v0.0.1) (Size: ?? GiB): corruption type = impulse_noise, severity = 2

  • "impulse_noise_3" (v0.0.1) (Size: ?? GiB): corruption type = impulse_noise, severity = 3

  • "impulse_noise_4" (v0.0.1) (Size: ?? GiB): corruption type = impulse_noise, severity = 4

  • "impulse_noise_5" (v0.0.1) (Size: ?? GiB): corruption type = impulse_noise, severity = 5

  • "defocus_blur_1" (v0.0.1) (Size: ?? GiB): corruption type = defocus_blur, severity = 1

  • "defocus_blur_2" (v0.0.1) (Size: ?? GiB): corruption type = defocus_blur, severity = 2

  • "defocus_blur_3" (v0.0.1) (Size: ?? GiB): corruption type = defocus_blur, severity = 3

  • "defocus_blur_4" (v0.0.1) (Size: ?? GiB): corruption type = defocus_blur, severity = 4

  • "defocus_blur_5" (v0.0.1) (Size: ?? GiB): corruption type = defocus_blur, severity = 5

  • "frosted_glass_blur_1" (v0.0.1) (Size: ?? GiB): corruption type = frosted_glass_blur, severity = 1

  • "frosted_glass_blur_2" (v0.0.1) (Size: ?? GiB): corruption type = frosted_glass_blur, severity = 2

  • "frosted_glass_blur_3" (v0.0.1) (Size: ?? GiB): corruption type = frosted_glass_blur, severity = 3

  • "frosted_glass_blur_4" (v0.0.1) (Size: ?? GiB): corruption type = frosted_glass_blur, severity = 4

  • "frosted_glass_blur_5" (v0.0.1) (Size: ?? GiB): corruption type = frosted_glass_blur, severity = 5

  • "zoom_blur_1" (v0.0.1) (Size: ?? GiB): corruption type = zoom_blur, severity = 1

  • "zoom_blur_2" (v0.0.1) (Size: ?? GiB): corruption type = zoom_blur, severity = 2

  • "zoom_blur_3" (v0.0.1) (Size: ?? GiB): corruption type = zoom_blur, severity = 3

  • "zoom_blur_4" (v0.0.1) (Size: ?? GiB): corruption type = zoom_blur, severity = 4

  • "zoom_blur_5" (v0.0.1) (Size: ?? GiB): corruption type = zoom_blur, severity = 5

  • "fog_1" (v0.0.1) (Size: ?? GiB): corruption type = fog, severity = 1

  • "fog_2" (v0.0.1) (Size: ?? GiB): corruption type = fog, severity = 2

  • "fog_3" (v0.0.1) (Size: ?? GiB): corruption type = fog, severity = 3

  • "fog_4" (v0.0.1) (Size: ?? GiB): corruption type = fog, severity = 4

  • "fog_5" (v0.0.1) (Size: ?? GiB): corruption type = fog, severity = 5

  • "brightness_1" (v0.0.1) (Size: ?? GiB): corruption type = brightness, severity = 1

  • "brightness_2" (v0.0.1) (Size: ?? GiB): corruption type = brightness, severity = 2

  • "brightness_3" (v0.0.1) (Size: ?? GiB): corruption type = brightness, severity = 3

  • "brightness_4" (v0.0.1) (Size: ?? GiB): corruption type = brightness, severity = 4

  • "brightness_5" (v0.0.1) (Size: ?? GiB): corruption type = brightness, severity = 5

  • "contrast_1" (v0.0.1) (Size: ?? GiB): corruption type = contrast, severity = 1

  • "contrast_2" (v0.0.1) (Size: ?? GiB): corruption type = contrast, severity = 2

  • "contrast_3" (v0.0.1) (Size: ?? GiB): corruption type = contrast, severity = 3

  • "contrast_4" (v0.0.1) (Size: ?? GiB): corruption type = contrast, severity = 4

  • "contrast_5" (v0.0.1) (Size: ?? GiB): corruption type = contrast, severity = 5

  • "elastic_1" (v0.0.1) (Size: ?? GiB): corruption type = elastic, severity = 1

  • "elastic_2" (v0.0.1) (Size: ?? GiB): corruption type = elastic, severity = 2

  • "elastic_3" (v0.0.1) (Size: ?? GiB): corruption type = elastic, severity = 3

  • "elastic_4" (v0.0.1) (Size: ?? GiB): corruption type = elastic, severity = 4

  • "elastic_5" (v0.0.1) (Size: ?? GiB): corruption type = elastic, severity = 5

  • "pixelate_1" (v0.0.1) (Size: ?? GiB): corruption type = pixelate, severity = 1

  • "pixelate_2" (v0.0.1) (Size: ?? GiB): corruption type = pixelate, severity = 2

  • "pixelate_3" (v0.0.1) (Size: ?? GiB): corruption type = pixelate, severity = 3

  • "pixelate_4" (v0.0.1) (Size: ?? GiB): corruption type = pixelate, severity = 4

  • "pixelate_5" (v0.0.1) (Size: ?? GiB): corruption type = pixelate, severity = 5

  • "jpeg_compression_1" (v0.0.1) (Size: ?? GiB): corruption type = jpeg_compression, severity = 1

  • "jpeg_compression_2" (v0.0.1) (Size: ?? GiB): corruption type = jpeg_compression, severity = 2

  • "jpeg_compression_3" (v0.0.1) (Size: ?? GiB): corruption type = jpeg_compression, severity = 3

  • "jpeg_compression_4" (v0.0.1) (Size: ?? GiB): corruption type = jpeg_compression, severity = 4

  • "jpeg_compression_5" (v0.0.1) (Size: ?? GiB): corruption type = jpeg_compression, severity = 5

"imagenet2012_corrupted/gaussian_noise_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/gaussian_noise_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/gaussian_noise_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/gaussian_noise_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/gaussian_noise_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/shot_noise_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/shot_noise_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/shot_noise_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/shot_noise_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/shot_noise_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/impulse_noise_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/impulse_noise_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/impulse_noise_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/impulse_noise_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/impulse_noise_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/defocus_blur_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/defocus_blur_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/defocus_blur_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/defocus_blur_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/defocus_blur_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/frosted_glass_blur_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/frosted_glass_blur_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/frosted_glass_blur_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/frosted_glass_blur_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/frosted_glass_blur_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/zoom_blur_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/zoom_blur_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/zoom_blur_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/zoom_blur_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/zoom_blur_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/fog_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/fog_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/fog_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/fog_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/fog_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/brightness_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/brightness_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/brightness_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/brightness_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/brightness_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/contrast_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/contrast_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/contrast_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/contrast_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/contrast_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/elastic_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/elastic_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/elastic_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/elastic_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/elastic_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/pixelate_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/pixelate_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/pixelate_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/pixelate_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/pixelate_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/jpeg_compression_1"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/jpeg_compression_2"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/jpeg_compression_3"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/jpeg_compression_4"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

"imagenet2012_corrupted/jpeg_compression_5"

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1000),
})

Statistics

Split Examples
VALIDATION 50,000
ALL 50,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@inproceedings{
  hendrycks2018benchmarking,
  title={Benchmarking Neural Network Robustness to Common Corruptions and Perturbations},
  author={Dan Hendrycks and Thomas Dietterich},
  booktitle={International Conference on Learning Representations},
  year={2019},
  url={https://openreview.net/forum?id=HJz6tiCqYm},
}

"kmnist"

Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images), provided in the original MNIST format as well as a NumPy format. Since MNIST restricts us to 10 classes, we chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST.

Features

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
ALL 70,000
TRAIN 60,000
TEST 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@online{clanuwat2018deep,
  author       = {Tarin Clanuwat and Mikel Bober-Irizar and Asanobu Kitamoto and Alex Lamb and Kazuaki Yamamoto and David Ha},
  title        = {Deep Learning for Classical Japanese Literature},
  date         = {2018-12-03},
  year         = {2018},
  eprintclass  = {cs.CV},
  eprinttype   = {arXiv},
  eprint       = {cs.CV/1812.01718},
}

"lsun"

Large scale images showing different objects from given categories like bedroom, tower etc.

lsun is configured with tfds.image.lsun.BuilderConfig and has the following configurations predefined (defaults to the first one):

  • "classroom" (v0.1.1) (Size: 3.06 GiB): Images of category classroom

  • "bedroom" (v0.1.1) (Size: 42.77 GiB): Images of category bedroom

  • "bridge" (v0.1.1) (Size: 15.35 GiB): Images of category bridge

  • "church_outdoor" (v0.1.1) (Size: 2.29 GiB): Images of category church_outdoor

  • "conference_room" (v0.1.1) (Size: 3.78 GiB): Images of category conference_room

  • "dining_room" (v0.1.1) (Size: 10.80 GiB): Images of category dining_room

  • "kitchen" (v0.1.1) (Size: 33.34 GiB): Images of category kitchen

  • "living_room" (v0.1.1) (Size: 21.23 GiB): Images of category living_room

  • "restaurant" (v0.1.1) (Size: 12.57 GiB): Images of category restaurant

  • "tower" (v0.1.1) (Size: 11.19 GiB): Images of category tower

"lsun/classroom"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/bedroom"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/bridge"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/church_outdoor"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/conference_room"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/dining_room"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/kitchen"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/living_room"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/restaurant"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

"lsun/tower"

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
})

Statistics

Split Examples
ALL 708,564
TRAIN 708,264
VALIDATION 300

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{journals/corr/YuZSSX15,
  added-at = {2018-08-13T00:00:00.000+0200},
  author = {Yu, Fisher and Zhang, Yinda and Song, Shuran and Seff, Ari and Xiao, Jianxiong},
  biburl = {https://www.bibsonomy.org/bibtex/2446d4ffb99a5d7d2ab6e5417a12e195f/dblp},
  ee = {http://arxiv.org/abs/1506.03365},
  interhash = {3e9306c4ce2ead125f3b2ab0e25adc85},
  intrahash = {446d4ffb99a5d7d2ab6e5417a12e195f},
  journal = {CoRR},
  keywords = {dblp},
  timestamp = {2018-08-14T15:08:59.000+0200},
  title = {LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop.},
  url = {http://dblp.uni-trier.de/db/journals/corr/corr1506.html#YuZSSX15},
  volume = {abs/1506.03365},
  year = 2015
}

"mnist"

The MNIST database of handwritten digits.

Features

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
ALL 70,000
TRAIN 60,000
TEST 10,000

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{lecun2010mnist,
  title={MNIST handwritten digit database},
  author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
  journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
  volume={2},
  year={2010}
}

"omniglot"

Omniglot data set for one-shot learning. This dataset contains 1623 different handwritten characters from 50 different alphabets.

Features

FeaturesDict({
    'alphabet': ClassLabel(shape=(), dtype=tf.int64, num_classes=50),
    'alphabet_char_id': Tensor(shape=(), dtype=tf.int64),
    'image': Image(shape=(105, 105, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=1623),
})

Statistics

Split Examples
ALL 38,300
TRAIN 19,280
TEST 13,180
SMALL2 3,120
SMALL1 2,720

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{lake2015human,
  title={Human-level concept learning through probabilistic program induction},
  author={Lake, Brenden M and Salakhutdinov, Ruslan and Tenenbaum, Joshua B},
  journal={Science},
  volume={350},
  number={6266},
  pages={1332--1338},
  year={2015},
  publisher={American Association for the Advancement of Science}
}

"open_images_v4"

Open Images is a dataset of ~9M images that have been annotated with image-level labels and object bounding boxes.

The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8.4 per image on average). Moreover, the dataset is annotated with image-level labels spanning thousands of classes.

Features

FeaturesDict({
    'bobjects': SequenceDict({
        'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
        'is_depiction': Tensor(shape=(), dtype=tf.int8),
        'is_group_of': Tensor(shape=(), dtype=tf.int8),
        'is_inside': Tensor(shape=(), dtype=tf.int8),
        'is_occluded': Tensor(shape=(), dtype=tf.int8),
        'is_truncated': Tensor(shape=(), dtype=tf.int8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=19995),
        'source': ClassLabel(shape=(), dtype=tf.int64, num_classes=6),
    }),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
    'objects': SequenceDict({
        'confidence': Tensor(shape=(), dtype=tf.int32),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=19995),
        'source': ClassLabel(shape=(), dtype=tf.int64, num_classes=6),
    }),
})

Statistics

Split Examples
ALL 1,910,098
TRAIN 1,743,042
TEST 125,436
VALIDATION 41,620

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{OpenImages,
  author = {Alina Kuznetsova and
            Hassan Rom and
            Neil Alldrin and
            Jasper Uijlings and
            Ivan Krasin and
            Jordi Pont-Tuset and
            Shahab Kamali and
            Stefan Popov and
            Matteo Malloci and
            Tom Duerig and
            Vittorio Ferrari},
  title = {The Open Images Dataset V4: Unified image classification,
           object detection, and visual relationship detection at scale},
  year = {2018},
  journal = {arXiv:1811.00982}
}
@article{OpenImages2,
  author = {Krasin, Ivan and
            Duerig, Tom and
            Alldrin, Neil and
            Ferrari, Vittorio
            and Abu-El-Haija, Sami and
            Kuznetsova, Alina and
            Rom, Hassan and
            Uijlings, Jasper and
            Popov, Stefan and
            Kamali, Shahab and
            Malloci, Matteo and
            Pont-Tuset, Jordi and
            Veit, Andreas and
            Belongie, Serge and
            Gomes, Victor and
            Gupta, Abhinav and
            Sun, Chen and
            Chechik, Gal and
            Cai, David and
            Feng, Zheyun and
            Narayanan, Dhyanesh and
            Murphy, Kevin},
  title = {OpenImages: A public dataset for large-scale multi-label and
           multi-class image classification.},
  journal = {Dataset available from
             https://storage.googleapis.com/openimages/web/index.html},
  year={2017}
}

"oxford_iiit_pet"

The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed.

Features

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=37),
})

Statistics

Split Examples
ALL 7,349
TRAIN 3,680
TEST 3,669

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@InProceedings{parkhi12a,
  author       = "Parkhi, O. M. and Vedaldi, A. and Zisserman, A. and Jawahar, C.~V.",
  title        = "Cats and Dogs",
  booktitle    = "IEEE Conference on Computer Vision and Pattern Recognition",
  year         = "2012",
}

"quickdraw_bitmap"

The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!. The bitmap dataset contains these drawings converted from vector format into 28x28 grayscale images

Features

FeaturesDict({
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=345),
})

Statistics

Split Examples
TRAIN 50,426,266
ALL 50,426,266

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{DBLP:journals/corr/HaE17,
  author    = {David Ha and
               Douglas Eck},
  title     = {A Neural Representation of Sketch Drawings},
  journal   = {CoRR},
  volume    = {abs/1704.03477},
  year      = {2017},
  url       = {http://arxiv.org/abs/1704.03477},
  archivePrefix = {arXiv},
  eprint    = {1704.03477},
  timestamp = {Mon, 13 Aug 2018 16:48:30 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/HaE17},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"rock_paper_scissors"

Images of hands playing rock, paper, scissor game.

Features

FeaturesDict({
    'image': Image(shape=(300, 300, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
})

Statistics

Split Examples
ALL 2,892
TRAIN 2,520
TEST 372

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@ONLINE {rps,
author = "Laurence Moroney",
title = "Rock, Paper, Scissors Dataset",
month = "feb",
year = "2019",
url = "http://laurencemoroney.com/rock-paper-scissors-dataset"
}

"shapes3d"

3dshapes is a dataset of 3D shapes procedurally generated from 6 ground truth independent latent factors. These factors are floor colour, wall colour, object colour, scale, shape and orientation.

All possible combinations of these latents are present exactly once, generating N = 480000 total images.

Latent factor values

  • floor hue: 10 values linearly spaced in [0, 1]
  • wall hue: 10 values linearly spaced in [0, 1]
  • object hue: 10 values linearly spaced in [0, 1]
  • scale: 8 values linearly spaced in [0, 1]
  • shape: 4 values in [0, 1, 2, 3]
  • orientation: 15 values linearly spaced in [-30, 30]

We varied one latent at a time (starting from orientation, then shape, etc), and sequentially stored the images in fixed order in the images array. The corresponding values of the factors are stored in the same order in the labels array.

Features

FeaturesDict({
    'image': Image(shape=(64, 64, 3), dtype=tf.uint8),
    'label_floor_hue': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    'label_object_hue': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    'label_orientation': ClassLabel(shape=(), dtype=tf.int64, num_classes=15),
    'label_scale': ClassLabel(shape=(), dtype=tf.int64, num_classes=8),
    'label_shape': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
    'label_wall_hue': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    'value_floor_hue': Tensor(shape=[], dtype=tf.float32),
    'value_object_hue': Tensor(shape=[], dtype=tf.float32),
    'value_orientation': Tensor(shape=[], dtype=tf.float32),
    'value_scale': Tensor(shape=[], dtype=tf.float32),
    'value_shape': Tensor(shape=[], dtype=tf.float32),
    'value_wall_hue': Tensor(shape=[], dtype=tf.float32),
})

Statistics

Split Examples
TRAIN 480,000
ALL 480,000

Urls

Supervised keys (for as_supervised=True)

None

Citation

@misc{3dshapes18,
  title={3D Shapes Dataset},
  author={Burgess, Chris and Kim, Hyunjik},
  howpublished={https://github.com/deepmind/3dshapes-dataset/},
  year={2018}
}

"smallnorb"


This database is intended for experiments in 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees).

The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).

Features

FeaturesDict({
    'image': Image(shape=(96, 96, 1), dtype=tf.uint8),
    'image2': Image(shape=(96, 96, 1), dtype=tf.uint8),
    'instance': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    'label_azimuth': ClassLabel(shape=(), dtype=tf.int64, num_classes=18),
    'label_category': ClassLabel(shape=(), dtype=tf.int64, num_classes=5),
    'label_elevation': ClassLabel(shape=(), dtype=tf.int64, num_classes=9),
    'label_lighting': ClassLabel(shape=(), dtype=tf.int64, num_classes=6),
})

Statistics

Split Examples
ALL 48,600
TRAIN 24,300
TEST 24,300

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label_category')

Citation

\
@article{LeCun2004LearningMF,
  title={Learning methods for generic object recognition with invariance to pose and lighting},
  author={Yann LeCun and Fu Jie Huang and L{\'e}on Bottou},
  journal={Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
  year={2004},
  volume={2},
  pages={II-104 Vol.2}
}

"sun397"

The database contains 108,754 images of 397 categories, used in the Scene UNderstanding (SUN) benchmark. The number of images varies across categories, but there are at least 100 images per category.

The official release of the dataset defines 10 overlapping partitions of the dataset, with 50 testing and training images in each. Since TFDS requires the splits not to overlap, we provide a single split for the entire dataset (named "full"). All images are converted to RGB.

Features

FeaturesDict({
    'file_name': Text(shape=(), dtype=tf.string, encoder=None),
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=397),
})

Statistics

Split Examples
FULL 108,753
ALL 108,753

Urls

Supervised keys (for as_supervised=True)

None

Citation

@INPROCEEDINGS{Xiao:2010,
author={J. {Xiao} and J. {Hays} and K. A. {Ehinger} and A. {Oliva} and A. {Torralba}},
booktitle={2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
title={SUN database: Large-scale scene recognition from abbey to zoo},
year={2010},
volume={},
number={},
pages={3485-3492},
keywords={computer vision;human factors;image classification;object recognition;visual databases;SUN database;large-scale scene recognition;abbey;zoo;scene categorization;computer vision;scene understanding research;scene category;object categorization;scene understanding database;state-of-the-art algorithms;human scene classification performance;finer-grained scene representation;Sun;Large-scale systems;Layout;Humans;Image databases;Computer vision;Anthropometry;Bridges;Legged locomotion;Spatial databases}, 
doi={10.1109/CVPR.2010.5539970},
ISSN={1063-6919},
month={June},}

"svhn_cropped"

The Street View House Numbers (SVHN) Dataset is an image digit recognition dataset of over 600,000 digit images coming from real world data. Images are cropped to 32x32.

Features

FeaturesDict({
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
})

Statistics

Split Examples
ALL 630,420
EXTRA 531,131
TRAIN 73,257
TEST 26,032

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@article{Netzer2011,
author = {Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Bo and Ng, Andrew Y},
booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
title = {Reading Digits in Natural Images with Unsupervised Feature Learning},
year = {2011}
}

"tf_flowers"

A large set of images of flowers

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=5),
})

Statistics

Split Examples
TRAIN 3,670
ALL 3,670

Urls

Supervised keys (for as_supervised=True)

(u'image', u'label')

Citation

@ONLINE {tfflowers,
author = "The TensorFlow Team",
title = "Flowers",
month = "jan",
year = "2019",
url = "http://download.tensorflow.org/example_images/flower_photos.tgz" }

"voc2007"

This dataset contains the data from the PASCAL Visual Object Classes Challenge 2007, a.k.a. VOC2007, corresponding to the Classification and Detection competitions. A total of 9,963 images are included in this dataset, where each image contains a set of objects, out of 20 different classes, making a total of 24,640 annotated objects. In the Classification competition, the goal is to predict the set of labels contained in the image, while in the Detection competition the goal is to predict the bounding box and label of each individual object.

Features

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=tf.uint8),
    'image/filename': Text(shape=(), dtype=tf.string, encoder=None),
    'labels': Sequence(shape=(None,), dtype=tf.int64, feature=ClassLabel(shape=(), dtype=tf.int64, num_classes=20)),
    'labels_no_difficult': Sequence(shape=(None,), dtype=tf.int64, feature=ClassLabel(shape=(), dtype=tf.int64, num_classes=20)),
    'objects': SequenceDict({
        'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
        'is_difficult': Tensor(shape=(), dtype=tf.bool),
        'is_truncated': Tensor(shape=(), dtype=tf.bool),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=20),
        'pose': ClassLabel(shape=(), dtype=tf.int64, num_classes=5),
    }),
})

Statistics

Split Examples
ALL 9,963
TEST 4,952
VALIDATION 2,510
TRAIN 2,501

Urls

Supervised keys (for as_supervised=True)

None

Citation

@misc{pascal-voc-2007,
  author = "Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.",
  title = "The {PASCAL} {V}isual {O}bject {C}lasses {C}hallenge 2007 {(VOC2007)} {R}esults",
  howpublished = "http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html"}

structured

"higgs"

The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. There is an interest in using deep learning methods to obviate the need for physicists to manually develop such features. Benchmark results using Bayesian Decision Trees from a standard physics package and 5-layer neural networks are presented in the original paper.

Features

FeaturesDict({
    'class_label': Tensor(shape=(), dtype=tf.float32),
    'jet_1_b-tag': Tensor(shape=(), dtype=tf.float64),
    'jet_1_eta': Tensor(shape=(), dtype=tf.float64),
    'jet_1_phi': Tensor(shape=(), dtype=tf.float64),
    'jet_1_pt': Tensor(shape=(), dtype=tf.float64),
    'jet_2_b-tag': Tensor(shape=(), dtype=tf.float64),
    'jet_2_eta': Tensor(shape=(), dtype=tf.float64),
    'jet_2_phi': Tensor(shape=(), dtype=tf.float64),
    'jet_2_pt': Tensor(shape=(), dtype=tf.float64),
    'jet_3_b-tag': Tensor(shape=(), dtype=tf.float64),
    'jet_3_eta': Tensor(shape=(), dtype=tf.float64),
    'jet_3_phi': Tensor(shape=(), dtype=tf.float64),
    'jet_3_pt': Tensor(shape=(), dtype=tf.float64),
    'jet_4_b-tag': Tensor(shape=(), dtype=tf.float64),
    'jet_4_eta': Tensor(shape=(), dtype=tf.float64),
    'jet_4_phi': Tensor(shape=(), dtype=tf.float64),
    'jet_4_pt': Tensor(shape=(), dtype=tf.float64),
    'lepton_eta': Tensor(shape=(), dtype=tf.float64),
    'lepton_pT': Tensor(shape=(), dtype=tf.float64),
    'lepton_phi': Tensor(shape=(), dtype=tf.float64),
    'm_bb': Tensor(shape=(), dtype=tf.float64),
    'm_jj': Tensor(shape=(), dtype=tf.float64),
    'm_jjj': Tensor(shape=(), dtype=tf.float64),
    'm_jlv': Tensor(shape=(), dtype=tf.float64),
    'm_lv': Tensor(shape=(), dtype=tf.float64),
    'm_wbb': Tensor(shape=(), dtype=tf.float64),
    'm_wwbb': Tensor(shape=(), dtype=tf.float64),
    'missing_energy_magnitude': Tensor(shape=(), dtype=tf.float64),
    'missing_energy_phi': Tensor(shape=(), dtype=tf.float64),
})

Statistics

Split Examples
TRAIN 11,000,000
ALL 11,000,000

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{Baldi:2014kfa,
      author         = "Baldi, Pierre and Sadowski, Peter and Whiteson, Daniel",
      title          = "{Searching for Exotic Particles in High-Energy Physics
                        with Deep Learning}",
      journal        = "Nature Commun.",
      volume         = "5",
      year           = "2014",
      pages          = "4308",
      doi            = "10.1038/ncomms5308",
      eprint         = "1402.4735",
      archivePrefix  = "arXiv",
      primaryClass   = "hep-ph",
      SLACcitation   = "%%CITATION = ARXIV:1402.4735;%%"
}

"titanic"

Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

Features

FeaturesDict({
    'features': FeaturesDict({
        'age': Tensor(shape=(), dtype=tf.float32),
        'boat': Tensor(shape=(), dtype=tf.string),
        'body': Tensor(shape=(), dtype=tf.int32),
        'cabin': Tensor(shape=(), dtype=tf.string),
        'embarked': ClassLabel(shape=(), dtype=tf.int64, num_classes=4),
        'fare': Tensor(shape=(), dtype=tf.float32),
        'home.dest': Tensor(shape=(), dtype=tf.string),
        'name': Tensor(shape=(), dtype=tf.string),
        'parch': Tensor(shape=(), dtype=tf.int32),
        'pclass': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
        'sex': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
        'sibsp': Tensor(shape=(), dtype=tf.int32),
        'ticket': Tensor(shape=(), dtype=tf.string),
    }),
    'survived': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
})

Statistics

Split Examples
TRAIN 1,309
ALL 1,309

Urls

Supervised keys (for as_supervised=True)

(u'features', u'survived')

Citation

@ONLINE {titanic,
author = "Frank E. Harrell Jr., Thomas Cason",
title  = "Titanic dataset",
month  = "oct",
year   = "2017",
url    = "https://www.openml.org/d/40945"
}

text

"cnn_dailymail"

CNN/DailyMail non-anonymized summarization dataset.

There are two features: - article: text of news article, used as the document to be summarized - highlights: joined text of highlights with and around each highlight, which is the target summary

cnn_dailymail is configured with tfds.text.cnn_dailymail.CnnDailymailConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.1) (Size: 558.32 MiB): Plain text

  • "bytes" (v0.0.1) (Size: 558.32 MiB): Uses byte-level text encoding with tfds.features.text.ByteTextEncoder

  • "subwords32k" (v0.0.1) (Size: 558.32 MiB): Uses tfds.features.text.SubwordTextEncoder with 32k vocab size

"cnn_dailymail/plain_text"

FeaturesDict({
    'article': Text(shape=(), dtype=tf.string, encoder=None),
    'highlights': Text(shape=(), dtype=tf.string, encoder=None),
})

"cnn_dailymail/bytes"

FeaturesDict({
    'article': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
    'highlights': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
})

"cnn_dailymail/subwords32k"

FeaturesDict({
    'article': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32915>),
    'highlights': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32915>),
})

Statistics

Split Examples
ALL 311,971
TRAIN 287,113
VALIDATION 13,368
TEST 11,490

Urls

Supervised keys (for as_supervised=True)

(u'article', u'highlights')

Citation

@article{DBLP:journals/corr/SeeLM17,
  author    = {Abigail See and
               Peter J. Liu and
               Christopher D. Manning},
  title     = {Get To The Point: Summarization with Pointer-Generator Networks},
  journal   = {CoRR},
  volume    = {abs/1704.04368},
  year      = {2017},
  url       = {http://arxiv.org/abs/1704.04368},
  archivePrefix = {arXiv},
  eprint    = {1704.04368},
  timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/SeeLM17},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"glue"

        The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task
        in which a system must read a sentence with a pronoun and select the referent of that pronoun from
        a list of choices. The examples are manually constructed to foil simple statistical methods: Each
        one is contingent on contextual information provided by a single word or phrase in the sentence.
        To convert the problem into sentence pair classification, we construct sentence pairs by replacing
        the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the
        pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of
        new examples derived from fiction books that was shared privately by the authors of the original
        corpus. While the included training set is balanced between two classes, the test set is imbalanced
        between them (65% not entailment). Also, due to a data quirk, the development set is adversarial:
        hypotheses are sometimes shared between training and development examples, so if a model memorizes the
        training examples, they will predict the wrong label on corresponding development set
        example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence
        between a model's score on this task and its score on the unconverted original task. We
        call converted dataset WNLI (Winograd NLI).

glue is configured with tfds.text.glue.GlueConfig and has the following configurations predefined (defaults to the first one):

  • "cola" (v0.0.1) (Size: ?? GiB): The Corpus of Linguistic Acceptability consists of English acceptability judgments drawn from books and journal articles on linguistic theory. Each example is a sequence of words annotated with whether it is a grammatical English sentence.

  • "sst2" (v0.0.1) (Size: ?? GiB): The Stanford Sentiment Treebank consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment of a given sentence. We use the two-way (positive/negative) class split, and use only sentence-level labels.

  • "mrpc" (v0.0.1) (Size: ?? GiB): The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

  • "qqp" (v0.0.1) (Size: ?? GiB): The Quora Question Pairs2 dataset is a collection of question pairs from the community question-answering website Quora. The task is to determine whether a pair of questions are semantically equivalent.

  • "stsb" (v0.0.1) (Size: ?? GiB): The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5.

  • "mnli_matched" (v0.0.1) (Size: ?? GiB): The Multi-Genre Natural Language Inference Corpusn is a crowdsourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. We use the standard test set, for which we obtained private labels from the authors, and evaluate on both the matched (in-domain) section. We also use and recommend the SNLI corpus as 550k examples of auxiliary training data.

  • "mnli_mismatched" (v0.0.1) (Size: ?? GiB): The Multi-Genre Natural Language Inference Corpusn is a crowdsourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. We use the standard test set, for which we obtained private labels from the authors, and evaluate on both the mismatched (cross-domain) section. We also use and recommend the SNLI corpus as 550k examples of auxiliary training data.

  • "qnli" (v0.0.1) (Size: ?? GiB): The Stanford Question Answering Dataset is a question-answering dataset consisting of question-paragraph pairs, where one of the sentences in the paragraph (drawn from Wikipedia) contains the answer to the corresponding question (written by an annotator). We convert the task into sentence pair classification by forming a pair between each question and each sentence in the corresponding context, and filtering out pairs with low lexical overlap between the question and the context sentence. The task is to determine whether the context sentence contains the answer to the question. This modified version of the original task removes the requirement that the model select the exact answer, but also removes the simplifying assumptions that the answer is always present in the input and that lexical overlap is a reliable cue.

  • "rte" (v0.0.1) (Size: ?? GiB): The Recognizing Textual Entailment (RTE) datasets come from a series of annual textual entailment challenges. We combine the data from RTE1 (Dagan et al., 2006), RTE2 (Bar Haim et al., 2006), RTE3 (Giampiccolo et al., 2007), and RTE5 (Bentivogli et al., 2009).4 Examples are constructed based on news and Wikipedia text. We convert all datasets to a two-class split, where for three-class datasets we collapse neutral and contradiction into not entailment, for consistency.

  • "wnli" (v0.0.1) (Size: ?? GiB): The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task in which a system must read a sentence with a pronoun and select the referent of that pronoun from a list of choices. The examples are manually constructed to foil simple statistical methods: Each one is contingent on contextual information provided by a single word or phrase in the sentence. To convert the problem into sentence pair classification, we construct sentence pairs by replacing the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. We use a small evaluation set consisting of new examples derived from fiction books that was shared privately by the authors of the original corpus. While the included training set is balanced between two classes, the test set is imbalanced between them (65% not entailment). Also, due to a data quirk, the development set is adversarial: hypotheses are sometimes shared between training and development examples, so if a model memorizes the training examples, they will predict the wrong label on corresponding development set example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence between a model's score on this task and its score on the unconverted original task. We call converted dataset WNLI (Winograd NLI).

"glue/cola"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/sst2"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/mrpc"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string, encoder=None),
    'sentence2': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/qqp"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'question1': Text(shape=(), dtype=tf.string, encoder=None),
    'question2': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/stsb"

FeaturesDict({
    'label': Tensor(shape=(), dtype=tf.float32),
    'sentence1': Text(shape=(), dtype=tf.string, encoder=None),
    'sentence2': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/mnli_matched"

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string, encoder=None),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/mnli_mismatched"

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string, encoder=None),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/qnli"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'question': Text(shape=(), dtype=tf.string, encoder=None),
    'sentence': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/rte"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string, encoder=None),
    'sentence2': Text(shape=(), dtype=tf.string, encoder=None),
})

"glue/wnli"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'sentence1': Text(shape=(), dtype=tf.string, encoder=None),
    'sentence2': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

None computed

Urls

Supervised keys (for as_supervised=True)

None

Citation

@inproceedings{levesque2012winograd,
              title={The winograd schema challenge},
              author={Levesque, Hector and Davis, Ernest and Morgenstern, Leora},
              booktitle={Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning},
              year={2012}
            }
@inproceedings{wang2019glue,
  title={ {GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

"imdb_reviews"

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.

imdb_reviews is configured with tfds.text.imdb.IMDBReviewsConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.1) (Size: 80.23 MiB): Plain text

  • "bytes" (v0.0.1) (Size: 80.23 MiB): Uses byte-level text encoding with tfds.features.text.ByteTextEncoder

  • "subwords8k" (v0.0.1) (Size: 80.23 MiB): Uses tfds.features.text.SubwordTextEncoder with 8k vocab size

  • "subwords32k" (v0.0.1) (Size: 80.23 MiB): Uses tfds.features.text.SubwordTextEncoder with 32k vocab size

"imdb_reviews/plain_text"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'text': Text(shape=(), dtype=tf.string, encoder=None),
})

"imdb_reviews/bytes"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
})

"imdb_reviews/subwords8k"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8185>),
})

"imdb_reviews/subwords32k"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32650>),
})

Statistics

Split Examples
ALL 50,000
TRAIN 25,000
TEST 25,000

Urls

Supervised keys (for as_supervised=True)

(u'text', u'label')

Citation

@InProceedings{maas-EtAl:2011:ACL-HLT2011,
  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {http://www.aclweb.org/anthology/P11-1015}
}

"lm1b"

A benchmark corpus to be used for measuring progress in statistical language modeling. This has almost one billion words in the training data.

lm1b is configured with tfds.text.lm1b.Lm1bConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.1) (Size: 1.67 GiB): Plain text

  • "bytes" (v0.0.1) (Size: 1.67 GiB): Uses byte-level text encoding with tfds.features.text.ByteTextEncoder

  • "subwords8k" (v0.0.2) (Size: 1.67 GiB): Uses tfds.features.text.SubwordTextEncoder with 8k vocab size

  • "subwords32k" (v0.0.2) (Size: 1.67 GiB): Uses tfds.features.text.SubwordTextEncoder with 32k vocab size

"lm1b/plain_text"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
})

"lm1b/bytes"

FeaturesDict({
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
})

"lm1b/subwords8k"

FeaturesDict({
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8189>),
})

"lm1b/subwords32k"

FeaturesDict({
    'text': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32711>),
})

Statistics

Split Examples
ALL 30,607,716
TRAIN 30,301,028
TEST 306,688

Urls

Supervised keys (for as_supervised=True)

(u'text', u'text')

Citation

@article{DBLP:journals/corr/ChelbaMSGBK13,
  author    = {Ciprian Chelba and
               Tomas Mikolov and
               Mike Schuster and
               Qi Ge and
               Thorsten Brants and
               Phillipp Koehn},
  title     = {One Billion Word Benchmark for Measuring Progress in Statistical Language
               Modeling},
  journal   = {CoRR},
  volume    = {abs/1312.3005},
  year      = {2013},
  url       = {http://arxiv.org/abs/1312.3005},
  archivePrefix = {arXiv},
  eprint    = {1312.3005},
  timestamp = {Mon, 13 Aug 2018 16:46:16 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/ChelbaMSGBK13},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"multi_nli"

The Multi-Genre Natural Language Inference (MultiNLI) corpus is a crowd-sourced collection of 433k sentence pairs annotated with textual entailment information. The corpus is modeled on the SNLI corpus, but differs in that covers a range of genres of spoken and written text, and supports a distinctive cross-genre generalization evaluation. The corpus served as the basis for the shared task of the RepEval 2017 Workshop at EMNLP in Copenhagen.

multi_nli is configured with tfds.text.multi_nli.MultiNLIConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.1) (Size: 216.34 MiB): Plain text

"multi_nli/plain_text"

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=tf.string, encoder=None),
    'label': Text(shape=(), dtype=tf.string, encoder=None),
    'premise': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 402,702
TRAIN 392,702
VALIDATION 10,000

Urls

Supervised keys (for as_supervised=True)

None

Citation

@InProceedings{N18-1101,
  author = "Williams, Adina
            and Nangia, Nikita
            and Bowman, Samuel",
  title = "A Broad-Coverage Challenge Corpus for
           Sentence Understanding through Inference",
  booktitle = "Proceedings of the 2018 Conference of
               the North American Chapter of the
               Association for Computational Linguistics:
               Human Language Technologies, Volume 1 (Long
               Papers)",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1112--1122",
  location = "New Orleans, Louisiana",
  url = "http://aclweb.org/anthology/N18-1101"
}

"squad"

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

squad is configured with tfds.text.squad.SquadConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.1) (Size: 33.51 MiB): Plain text

  • "bytes" (v0.0.1) (Size: 33.51 MiB): Uses byte-level text encoding with tfds.features.text.ByteTextEncoder

  • "subwords8k" (v0.0.1) (Size: 33.51 MiB): Uses tfds.features.text.SubwordTextEncoder with 8k vocab size

  • "subwords32k" (v0.0.2) (Size: 33.51 MiB): Uses tfds.features.text.SubwordTextEncoder with 32k vocab size

"squad/plain_text"

FeaturesDict({
    'context': Text(shape=(), dtype=tf.string, encoder=None),
    'first_answer': Text(shape=(), dtype=tf.string, encoder=None),
    'question': Text(shape=(), dtype=tf.string, encoder=None),
})

"squad/bytes"

FeaturesDict({
    'context': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
    'first_answer': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
    'question': Text(shape=(None,), dtype=tf.int64, encoder=<ByteTextEncoder vocab_size=257>),
})

"squad/subwords8k"

FeaturesDict({
    'context': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8190>),
    'first_answer': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8190>),
    'question': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8190>),
})

"squad/subwords32k"

FeaturesDict({
    'context': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32953>),
    'first_answer': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32953>),
    'question': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=32953>),
})

Statistics

Split Examples
ALL 98,169
TRAIN 87,599
VALIDATION 10,570

Urls

Supervised keys (for as_supervised=True)

(u'', u'')

Citation

@article{2016arXiv160605250R,
       author = { {Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
                 Konstantin and {Liang}, Percy},
        title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
      journal = {arXiv e-prints},
         year = 2016,
          eid = {arXiv:1606.05250},
        pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
       eprint = {1606.05250},
}

"wikipedia"

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

wikipedia is configured with tfds.text.wikipedia.WikipediaConfig and has the following configurations predefined (defaults to the first one):

  • "20190301.aa" (v0.0.2) (Size: 44.09 KiB): Wikipedia dataset for aa, parsed from 20190301 dump.

  • "20190301.ab" (v0.0.2) (Size: 1.31 MiB): Wikipedia dataset for ab, parsed from 20190301 dump.

  • "20190301.ace" (v0.0.2) (Size: 2.66 MiB): Wikipedia dataset for ace, parsed from 20190301 dump.

  • "20190301.ady" (v0.0.2) (Size: 349.43 KiB): Wikipedia dataset for ady, parsed from 20190301 dump.

  • "20190301.af" (v0.0.2) (Size: 84.13 MiB): Wikipedia dataset for af, parsed from 20190301 dump.

  • "20190301.ak" (v0.0.2) (Size: 377.84 KiB): Wikipedia dataset for ak, parsed from 20190301 dump.

  • "20190301.als" (v0.0.2) (Size: 46.90 MiB): Wikipedia dataset for als, parsed from 20190301 dump.

  • "20190301.am" (v0.0.2) (Size: 6.54 MiB): Wikipedia dataset for am, parsed from 20190301 dump.

  • "20190301.an" (v0.0.2) (Size: 31.39 MiB): Wikipedia dataset for an, parsed from 20190301 dump.

  • "20190301.ang" (v0.0.2) (Size: 3.77 MiB): Wikipedia dataset for ang, parsed from 20190301 dump.

  • "20190301.ar" (v0.0.2) (Size: 805.82 MiB): Wikipedia dataset for ar, parsed from 20190301 dump.

  • "20190301.arc" (v0.0.2) (Size: 952.49 KiB): Wikipedia dataset for arc, parsed from 20190301 dump.

  • "20190301.arz" (v0.0.2) (Size: 20.32 MiB): Wikipedia dataset for arz, parsed from 20190301 dump.

  • "20190301.as" (v0.0.2) (Size: 19.06 MiB): Wikipedia dataset for as, parsed from 20190301 dump.

  • "20190301.ast" (v0.0.2) (Size: 216.68 MiB): Wikipedia dataset for ast, parsed from 20190301 dump.

  • "20190301.atj" (v0.0.2) (Size: 467.05 KiB): Wikipedia dataset for atj, parsed from 20190301 dump.

  • "20190301.av" (v0.0.2) (Size: 3.61 MiB): Wikipedia dataset for av, parsed from 20190301 dump.

  • "20190301.ay" (v0.0.2) (Size: 2.06 MiB): Wikipedia dataset for ay, parsed from 20190301 dump.

  • "20190301.az" (v0.0.2) (Size: 163.04 MiB): Wikipedia dataset for az, parsed from 20190301 dump.

  • "20190301.azb" (v0.0.2) (Size: 50.59 MiB): Wikipedia dataset for azb, parsed from 20190301 dump.

  • "20190301.ba" (v0.0.2) (Size: 55.04 MiB): Wikipedia dataset for ba, parsed from 20190301 dump.

  • "20190301.bar" (v0.0.2) (Size: 30.14 MiB): Wikipedia dataset for bar, parsed from 20190301 dump.

  • "20190301.bat-smg" (v0.0.2) (Size: 4.61 MiB): Wikipedia dataset for bat-smg, parsed from 20190301 dump.

  • "20190301.bcl" (v0.0.2) (Size: 6.18 MiB): Wikipedia dataset for bcl, parsed from 20190301 dump.

  • "20190301.be" (v0.0.2) (Size: 192.23 MiB): Wikipedia dataset for be, parsed from 20190301 dump.

  • "20190301.be-x-old" (v0.0.2) (Size: 74.77 MiB): Wikipedia dataset for be-x-old, parsed from 20190301 dump.

  • "20190301.bg" (v0.0.2) (Size: 326.20 MiB): Wikipedia dataset for bg, parsed from 20190301 dump.

  • "20190301.bh" (v0.0.2) (Size: 13.28 MiB): Wikipedia dataset for bh, parsed from 20190301 dump.

  • "20190301.bi" (v0.0.2) (Size: 424.88 KiB): Wikipedia dataset for bi, parsed from 20190301 dump.

  • "20190301.bjn" (v0.0.2) (Size: 2.09 MiB): Wikipedia dataset for bjn, parsed from 20190301 dump.

  • "20190301.bm" (v0.0.2) (Size: 447.98 KiB): Wikipedia dataset for bm, parsed from 20190301 dump.

  • "20190301.bn" (v0.0.2) (Size: 145.04 MiB): Wikipedia dataset for bn, parsed from 20190301 dump.

  • "20190301.bo" (v0.0.2) (Size: 12.41 MiB): Wikipedia dataset for bo, parsed from 20190301 dump.

  • "20190301.bpy" (v0.0.2) (Size: 5.05 MiB): Wikipedia dataset for bpy, parsed from 20190301 dump.

  • "20190301.br" (v0.0.2) (Size: 49.14 MiB): Wikipedia dataset for br, parsed from 20190301 dump.

  • "20190301.bs" (v0.0.2) (Size: 103.26 MiB): Wikipedia dataset for bs, parsed from 20190301 dump.

  • "20190301.bug" (v0.0.2) (Size: 1.76 MiB): Wikipedia dataset for bug, parsed from 20190301 dump.

  • "20190301.bxr" (v0.0.2) (Size: 3.21 MiB): Wikipedia dataset for bxr, parsed from 20190301 dump.

  • "20190301.ca" (v0.0.2) (Size: 849.65 MiB): Wikipedia dataset for ca, parsed from 20190301 dump.

  • "20190301.cbk-zam" (v0.0.2) (Size: 1.84 MiB): Wikipedia dataset for cbk-zam, parsed from 20190301 dump.

  • "20190301.cdo" (v0.0.2) (Size: 3.22 MiB): Wikipedia dataset for cdo, parsed from 20190301 dump.

  • "20190301.ce" (v0.0.2) (Size: 43.89 MiB): Wikipedia dataset for ce, parsed from 20190301 dump.

  • "20190301.ceb" (v0.0.2) (Size: ?? GiB): Wikipedia dataset for ceb, parsed from 20190301 dump.

  • "20190301.ch" (v0.0.2) (Size: 684.97 KiB): Wikipedia dataset for ch, parsed from 20190301 dump.

  • "20190301.cho" (v0.0.2) (Size: 25.99 KiB): Wikipedia dataset for cho, parsed from 20190301 dump.

  • "20190301.chr" (v0.0.2) (Size: 651.25 KiB): Wikipedia dataset for chr, parsed from 20190301 dump.

  • "20190301.chy" (v0.0.2) (Size: 325.90 KiB): Wikipedia dataset for chy, parsed from 20190301 dump.

  • "20190301.ckb" (v0.0.2) (Size: 22.16 MiB): Wikipedia dataset for ckb, parsed from 20190301 dump.

  • "20190301.co" (v0.0.2) (Size: 3.38 MiB): Wikipedia dataset for co, parsed from 20190301 dump.

  • "20190301.cr" (v0.0.2) (Size: 259.71 KiB): Wikipedia dataset for cr, parsed from 20190301 dump.

  • "20190301.crh" (v0.0.2) (Size: 4.01 MiB): Wikipedia dataset for crh, parsed from 20190301 dump.

  • "20190301.cs" (v0.0.2) (Size: 759.21 MiB): Wikipedia dataset for cs, parsed from 20190301 dump.

  • "20190301.csb" (v0.0.2) (Size: 2.03 MiB): Wikipedia dataset for csb, parsed from 20190301 dump.

  • "20190301.cu" (v0.0.2) (Size: 631.49 KiB): Wikipedia dataset for cu, parsed from 20190301 dump.

  • "20190301.cv" (v0.0.2) (Size: 22.23 MiB): Wikipedia dataset for cv, parsed from 20190301 dump.

  • "20190301.cy" (v0.0.2) (Size: 64.37 MiB): Wikipedia dataset for cy, parsed from 20190301 dump.

  • "20190301.da" (v0.0.2) (Size: 323.53 MiB): Wikipedia dataset for da, parsed from 20190301 dump.

  • "20190301.de" (v0.0.2) (Size: 4.97 GiB): Wikipedia dataset for de, parsed from 20190301 dump.

  • "20190301.din" (v0.0.2) (Size: 457.06 KiB): Wikipedia dataset for din, parsed from 20190301 dump.

  • "20190301.diq" (v0.0.2) (Size: 7.24 MiB): Wikipedia dataset for diq, parsed from 20190301 dump.

  • "20190301.dsb" (v0.0.2) (Size: 3.54 MiB): Wikipedia dataset for dsb, parsed from 20190301 dump.

  • "20190301.dty" (v0.0.2) (Size: 4.95 MiB): Wikipedia dataset for dty, parsed from 20190301 dump.

  • "20190301.dv" (v0.0.2) (Size: 4.24 MiB): Wikipedia dataset for dv, parsed from 20190301 dump.

  • "20190301.dz" (v0.0.2) (Size: 360.01 KiB): Wikipedia dataset for dz, parsed from 20190301 dump.

  • "20190301.ee" (v0.0.2) (Size: 434.14 KiB): Wikipedia dataset for ee, parsed from 20190301 dump.

  • "20190301.el" (v0.0.2) (Size: 324.40 MiB): Wikipedia dataset for el, parsed from 20190301 dump.

  • "20190301.eml" (v0.0.2) (Size: 7.72 MiB): Wikipedia dataset for eml, parsed from 20190301 dump.

  • "20190301.en" (v0.0.2) (Size: 15.72 GiB): Wikipedia dataset for en, parsed from 20190301 dump.

  • "20190301.eo" (v0.0.2) (Size: 245.73 MiB): Wikipedia dataset for eo, parsed from 20190301 dump.

  • "20190301.es" (v0.0.2) (Size: 2.93 GiB): Wikipedia dataset for es, parsed from 20190301 dump.

  • "20190301.et" (v0.0.2) (Size: 196.03 MiB): Wikipedia dataset for et, parsed from 20190301 dump.

  • "20190301.eu" (v0.0.2) (Size: 180.35 MiB): Wikipedia dataset for eu, parsed from 20190301 dump.

  • "20190301.ext" (v0.0.2) (Size: 2.40 MiB): Wikipedia dataset for ext, parsed from 20190301 dump.

  • "20190301.fa" (v0.0.2) (Size: 693.84 MiB): Wikipedia dataset for fa, parsed from 20190301 dump.

  • "20190301.ff" (v0.0.2) (Size: 387.75 KiB): Wikipedia dataset for ff, parsed from 20190301 dump.

  • "20190301.fi" (v0.0.2) (Size: 656.44 MiB): Wikipedia dataset for fi, parsed from 20190301 dump.

  • "20190301.fiu-vro" (v0.0.2) (Size: 2.00 MiB): Wikipedia dataset for fiu-vro, parsed from 20190301 dump.

  • "20190301.fj" (v0.0.2) (Size: 262.98 KiB): Wikipedia dataset for fj, parsed from 20190301 dump.

  • "20190301.fo" (v0.0.2) (Size: 13.67 MiB): Wikipedia dataset for fo, parsed from 20190301 dump.

  • "20190301.fr" (v0.0.2) (Size: 4.14 GiB): Wikipedia dataset for fr, parsed from 20190301 dump.

  • "20190301.frp" (v0.0.2) (Size: 2.03 MiB): Wikipedia dataset for frp, parsed from 20190301 dump.

  • "20190301.frr" (v0.0.2) (Size: 7.88 MiB): Wikipedia dataset for frr, parsed from 20190301 dump.

  • "20190301.fur" (v0.0.2) (Size: 2.29 MiB): Wikipedia dataset for fur, parsed from 20190301 dump.

  • "20190301.fy" (v0.0.2) (Size: 45.52 MiB): Wikipedia dataset for fy, parsed from 20190301 dump.

  • "20190301.ga" (v0.0.2) (Size: 24.78 MiB): Wikipedia dataset for ga, parsed from 20190301 dump.

  • "20190301.gag" (v0.0.2) (Size: 2.04 MiB): Wikipedia dataset for gag, parsed from 20190301 dump.

  • "20190301.gan" (v0.0.2) (Size: 3.82 MiB): Wikipedia dataset for gan, parsed from 20190301 dump.

  • "20190301.gd" (v0.0.2) (Size: 8.51 MiB): Wikipedia dataset for gd, parsed from 20190301 dump.

  • "20190301.gl" (v0.0.2) (Size: 235.07 MiB): Wikipedia dataset for gl, parsed from 20190301 dump.

  • "20190301.glk" (v0.0.2) (Size: 1.91 MiB): Wikipedia dataset for glk, parsed from 20190301 dump.

  • "20190301.gn" (v0.0.2) (Size: 3.37 MiB): Wikipedia dataset for gn, parsed from 20190301 dump.

  • "20190301.gom" (v0.0.2) (Size: 6.07 MiB): Wikipedia dataset for gom, parsed from 20190301 dump.

  • "20190301.gor" (v0.0.2) (Size: 1.28 MiB): Wikipedia dataset for gor, parsed from 20190301 dump.

  • "20190301.got" (v0.0.2) (Size: 604.10 KiB): Wikipedia dataset for got, parsed from 20190301 dump.

  • "20190301.gu" (v0.0.2) (Size: 27.23 MiB): Wikipedia dataset for gu, parsed from 20190301 dump.

  • "20190301.gv" (v0.0.2) (Size: 5.32 MiB): Wikipedia dataset for gv, parsed from 20190301 dump.

  • "20190301.ha" (v0.0.2) (Size: 1.62 MiB): Wikipedia dataset for ha, parsed from 20190301 dump.

  • "20190301.hak" (v0.0.2) (Size: 3.28 MiB): Wikipedia dataset for hak, parsed from 20190301 dump.

  • "20190301.haw" (v0.0.2) (Size: 1017.76 KiB): Wikipedia dataset for haw, parsed from 20190301 dump.

  • "20190301.he" (v0.0.2) (Size: 572.30 MiB): Wikipedia dataset for he, parsed from 20190301 dump.

  • "20190301.hi" (v0.0.2) (Size: 137.86 MiB): Wikipedia dataset for hi, parsed from 20190301 dump.

  • "20190301.hif" (v0.0.2) (Size: 4.57 MiB): Wikipedia dataset for hif, parsed from 20190301 dump.

  • "20190301.ho" (v0.0.2) (Size: 18.37 KiB): Wikipedia dataset for ho, parsed from 20190301 dump.

  • "20190301.hr" (v0.0.2) (Size: 246.05 MiB): Wikipedia dataset for hr, parsed from 20190301 dump.

  • "20190301.hsb" (v0.0.2) (Size: 10.38 MiB): Wikipedia dataset for hsb, parsed from 20190301 dump.

  • "20190301.ht" (v0.0.2) (Size: 10.23 MiB): Wikipedia dataset for ht, parsed from 20190301 dump.

  • "20190301.hu" (v0.0.2) (Size: 810.17 MiB): Wikipedia dataset for hu, parsed from 20190301 dump.

  • "20190301.hy" (v0.0.2) (Size: 277.53 MiB): Wikipedia dataset for hy, parsed from 20190301 dump.

  • "20190301.hz" (v0.0.2) (Size: 16.35 KiB): Wikipedia dataset for hz, parsed from 20190301 dump.

  • "20190301.ia" (v0.0.2) (Size: 7.85 MiB): Wikipedia dataset for ia, parsed from 20190301 dump.

  • "20190301.id" (v0.0.2) (Size: 523.94 MiB): Wikipedia dataset for id, parsed from 20190301 dump.

  • "20190301.ie" (v0.0.2) (Size: 1.70 MiB): Wikipedia dataset for ie, parsed from 20190301 dump.

  • "20190301.ig" (v0.0.2) (Size: 1.00 MiB): Wikipedia dataset for ig, parsed from 20190301 dump.

  • "20190301.ii" (v0.0.2) (Size: 30.88 KiB): Wikipedia dataset for ii, parsed from 20190301 dump.

  • "20190301.ik" (v0.0.2) (Size: 238.12 KiB): Wikipedia dataset for ik, parsed from 20190301 dump.

  • "20190301.ilo" (v0.0.2) (Size: 15.22 MiB): Wikipedia dataset for ilo, parsed from 20190301 dump.

  • "20190301.inh" (v0.0.2) (Size: 1.26 MiB): Wikipedia dataset for inh, parsed from 20190301 dump.

  • "20190301.io" (v0.0.2) (Size: 12.56 MiB): Wikipedia dataset for io, parsed from 20190301 dump.

  • "20190301.is" (v0.0.2) (Size: 41.86 MiB): Wikipedia dataset for is, parsed from 20190301 dump.

  • "20190301.it" (v0.0.2) (Size: 2.66 GiB): Wikipedia dataset for it, parsed from 20190301 dump.

  • "20190301.iu" (v0.0.2) (Size: 284.06 KiB): Wikipedia dataset for iu, parsed from 20190301 dump.

  • "20190301.ja" (v0.0.2) (Size: 2.74 GiB): Wikipedia dataset for ja, parsed from 20190301 dump.

  • "20190301.jam" (v0.0.2) (Size: 895.29 KiB): Wikipedia dataset for jam, parsed from 20190301 dump.

  • "20190301.jbo" (v0.0.2) (Size: 1.06 MiB): Wikipedia dataset for jbo, parsed from 20190301 dump.

  • "20190301.jv" (v0.0.2) (Size: 39.32 MiB): Wikipedia dataset for jv, parsed from 20190301 dump.

  • "20190301.ka" (v0.0.2) (Size: 131.78 MiB): Wikipedia dataset for ka, parsed from 20190301 dump.

  • "20190301.kaa" (v0.0.2) (Size: 1.35 MiB): Wikipedia dataset for kaa, parsed from 20190301 dump.

  • "20190301.kab" (v0.0.2) (Size: 3.62 MiB): Wikipedia dataset for kab, parsed from 20190301 dump.

  • "20190301.kbd" (v0.0.2) (Size: 1.65 MiB): Wikipedia dataset for kbd, parsed from 20190301 dump.

  • "20190301.kbp" (v0.0.2) (Size: 1.24 MiB): Wikipedia dataset for kbp, parsed from 20190301 dump.

  • "20190301.kg" (v0.0.2) (Size: 439.26 KiB): Wikipedia dataset for kg, parsed from 20190301 dump.

  • "20190301.ki" (v0.0.2) (Size: 370.78 KiB): Wikipedia dataset for ki, parsed from 20190301 dump.

  • "20190301.kj" (v0.0.2) (Size: 16.58 KiB): Wikipedia dataset for kj, parsed from 20190301 dump.

  • "20190301.kk" (v0.0.2) (Size: 113.46 MiB): Wikipedia dataset for kk, parsed from 20190301 dump.

  • "20190301.kl" (v0.0.2) (Size: 862.51 KiB): Wikipedia dataset for kl, parsed from 20190301 dump.

  • "20190301.km" (v0.0.2) (Size: 21.92 MiB): Wikipedia dataset for km, parsed from 20190301 dump.

  • "20190301.kn" (v0.0.2) (Size: 69.62 MiB): Wikipedia dataset for kn, parsed from 20190301 dump.

  • "20190301.ko" (v0.0.2) (Size: 625.16 MiB): Wikipedia dataset for ko, parsed from 20190301 dump.

  • "20190301.koi" (v0.0.2) (Size: 2.12 MiB): Wikipedia dataset for koi, parsed from 20190301 dump.

  • "20190301.kr" (v0.0.2) (Size: 13.89 KiB): Wikipedia dataset for kr, parsed from 20190301 dump.

  • "20190301.krc" (v0.0.2) (Size: 3.16 MiB): Wikipedia dataset for krc, parsed from 20190301 dump.

  • "20190301.ks" (v0.0.2) (Size: 309.15 KiB): Wikipedia dataset for ks, parsed from 20190301 dump.

  • "20190301.ksh" (v0.0.2) (Size: 3.07 MiB): Wikipedia dataset for ksh, parsed from 20190301 dump.

  • "20190301.ku" (v0.0.2) (Size: 17.09 MiB): Wikipedia dataset for ku, parsed from 20190301 dump.

  • "20190301.kv" (v0.0.2) (Size: 3.36 MiB): Wikipedia dataset for kv, parsed from 20190301 dump.

  • "20190301.kw" (v0.0.2) (Size: 1.71 MiB): Wikipedia dataset for kw, parsed from 20190301 dump.

  • "20190301.ky" (v0.0.2) (Size: 33.13 MiB): Wikipedia dataset for ky, parsed from 20190301 dump.

  • "20190301.la" (v0.0.2) (Size: 82.72 MiB): Wikipedia dataset for la, parsed from 20190301 dump.

  • "20190301.lad" (v0.0.2) (Size: 3.39 MiB): Wikipedia dataset for lad, parsed from 20190301 dump.

  • "20190301.lb" (v0.0.2) (Size: 45.70 MiB): Wikipedia dataset for lb, parsed from 20190301 dump.

  • "20190301.lbe" (v0.0.2) (Size: 1.22 MiB): Wikipedia dataset for lbe, parsed from 20190301 dump.

  • "20190301.lez" (v0.0.2) (Size: 4.16 MiB): Wikipedia dataset for lez, parsed from 20190301 dump.

  • "20190301.lfn" (v0.0.2) (Size: 2.81 MiB): Wikipedia dataset for lfn, parsed from 20190301 dump.

  • "20190301.lg" (v0.0.2) (Size: 1.58 MiB): Wikipedia dataset for lg, parsed from 20190301 dump.

  • "20190301.li" (v0.0.2) (Size: 13.86 MiB): Wikipedia dataset for li, parsed from 20190301 dump.

  • "20190301.lij" (v0.0.2) (Size: 2.73 MiB): Wikipedia dataset for lij, parsed from 20190301 dump.

  • "20190301.lmo" (v0.0.2) (Size: 21.34 MiB): Wikipedia dataset for lmo, parsed from 20190301 dump.

  • "20190301.ln" (v0.0.2) (Size: 1.83 MiB): Wikipedia dataset for ln, parsed from 20190301 dump.

  • "20190301.lo" (v0.0.2) (Size: 3.44 MiB): Wikipedia dataset for lo, parsed from 20190301 dump.

  • "20190301.lrc" (v0.0.2) (Size: 4.71 MiB): Wikipedia dataset for lrc, parsed from 20190301 dump.

  • "20190301.lt" (v0.0.2) (Size: 174.73 MiB): Wikipedia dataset for lt, parsed from 20190301 dump.

  • "20190301.ltg" (v0.0.2) (Size: 798.18 KiB): Wikipedia dataset for ltg, parsed from 20190301 dump.

  • "20190301.lv" (v0.0.2) (Size: 127.47 MiB): Wikipedia dataset for lv, parsed from 20190301 dump.

  • "20190301.mai" (v0.0.2) (Size: 10.80 MiB): Wikipedia dataset for mai, parsed from 20190301 dump.

  • "20190301.map-bms" (v0.0.2) (Size: 4.49 MiB): Wikipedia dataset for map-bms, parsed from 20190301 dump.

  • "20190301.mdf" (v0.0.2) (Size: 1.04 MiB): Wikipedia dataset for mdf, parsed from 20190301 dump.

  • "20190301.mg" (v0.0.2) (Size: 25.64 MiB): Wikipedia dataset for mg, parsed from 20190301 dump.

  • "20190301.mh" (v0.0.2) (Size: 27.71 KiB): Wikipedia dataset for mh, parsed from 20190301 dump.

  • "20190301.mhr" (v0.0.2) (Size: 5.69 MiB): Wikipedia dataset for mhr, parsed from 20190301 dump.

  • "20190301.mi" (v0.0.2) (Size: 1.96 MiB): Wikipedia dataset for mi, parsed from 20190301 dump.

  • "20190301.min" (v0.0.2) (Size: 25.05 MiB): Wikipedia dataset for min, parsed from 20190301 dump.

  • "20190301.mk" (v0.0.2) (Size: 140.69 MiB): Wikipedia dataset for mk, parsed from 20190301 dump.

  • "20190301.ml" (v0.0.2) (Size: 117.24 MiB): Wikipedia dataset for ml, parsed from 20190301 dump.

  • "20190301.mn" (v0.0.2) (Size: 28.23 MiB): Wikipedia dataset for mn, parsed from 20190301 dump.

  • "20190301.mr" (v0.0.2) (Size: 49.58 MiB): Wikipedia dataset for mr, parsed from 20190301 dump.

  • "20190301.mrj" (v0.0.2) (Size: 3.01 MiB): Wikipedia dataset for mrj, parsed from 20190301 dump.

  • "20190301.ms" (v0.0.2) (Size: 205.79 MiB): Wikipedia dataset for ms, parsed from 20190301 dump.

  • "20190301.mt" (v0.0.2) (Size: 8.21 MiB): Wikipedia dataset for mt, parsed from 20190301 dump.

  • "20190301.mus" (v0.0.2) (Size: 14.20 KiB): Wikipedia dataset for mus, parsed from 20190301 dump.

  • "20190301.mwl" (v0.0.2) (Size: 8.95 MiB): Wikipedia dataset for mwl, parsed from 20190301 dump.

  • "20190301.my" (v0.0.2) (Size: 34.60 MiB): Wikipedia dataset for my, parsed from 20190301 dump.

  • "20190301.myv" (v0.0.2) (Size: 7.79 MiB): Wikipedia dataset for myv, parsed from 20190301 dump.

  • "20190301.mzn" (v0.0.2) (Size: 6.47 MiB): Wikipedia dataset for mzn, parsed from 20190301 dump.

  • "20190301.na" (v0.0.2) (Size: 480.57 KiB): Wikipedia dataset for na, parsed from 20190301 dump.

  • "20190301.nah" (v0.0.2) (Size: 4.30 MiB): Wikipedia dataset for nah, parsed from 20190301 dump.

  • "20190301.nap" (v0.0.2) (Size: 5.55 MiB): Wikipedia dataset for nap, parsed from 20190301 dump.

  • "20190301.nds" (v0.0.2) (Size: 33.28 MiB): Wikipedia dataset for nds, parsed from 20190301 dump.

  • "20190301.nds-nl" (v0.0.2) (Size: 6.67 MiB): Wikipedia dataset for nds-nl, parsed from 20190301 dump.

  • "20190301.ne" (v0.0.2) (Size: 29.26 MiB): Wikipedia dataset for ne, parsed from 20190301 dump.

  • "20190301.new" (v0.0.2) (Size: 16.91 MiB): Wikipedia dataset for new, parsed from 20190301 dump.

  • "20190301.ng" (v0.0.2) (Size: 91.11 KiB): Wikipedia dataset for ng, parsed from 20190301 dump.

  • "20190301.nl" (v0.0.2) (Size: 1.38 GiB): Wikipedia dataset for nl, parsed from 20190301 dump.

  • "20190301.nn" (v0.0.2) (Size: 126.01 MiB): Wikipedia dataset for nn, parsed from 20190301 dump.

  • "20190301.no" (v0.0.2) (Size: 610.74 MiB): Wikipedia dataset for no, parsed from 20190301 dump.

  • "20190301.nov" (v0.0.2) (Size: 1.12 MiB): Wikipedia dataset for nov, parsed from 20190301 dump.

  • "20190301.nrm" (v0.0.2) (Size: 1.56 MiB): Wikipedia dataset for nrm, parsed from 20190301 dump.

  • "20190301.nso" (v0.0.2) (Size: 2.20 MiB): Wikipedia dataset for nso, parsed from 20190301 dump.

  • "20190301.nv" (v0.0.2) (Size: 2.52 MiB): Wikipedia dataset for nv, parsed from 20190301 dump.

  • "20190301.ny" (v0.0.2) (Size: 1.18 MiB): Wikipedia dataset for ny, parsed from 20190301 dump.

  • "20190301.oc" (v0.0.2) (Size: 70.97 MiB): Wikipedia dataset for oc, parsed from 20190301 dump.

  • "20190301.olo" (v0.0.2) (Size: 1.55 MiB): Wikipedia dataset for olo, parsed from 20190301 dump.

  • "20190301.om" (v0.0.2) (Size: 1.06 MiB): Wikipedia dataset for om, parsed from 20190301 dump.

  • "20190301.or" (v0.0.2) (Size: 24.90 MiB): Wikipedia dataset for or, parsed from 20190301 dump.

  • "20190301.os" (v0.0.2) (Size: 7.31 MiB): Wikipedia dataset for os, parsed from 20190301 dump.

  • "20190301.pa" (v0.0.2) (Size: 40.39 MiB): Wikipedia dataset for pa, parsed from 20190301 dump.

  • "20190301.pag" (v0.0.2) (Size: 1.29 MiB): Wikipedia dataset for pag, parsed from 20190301 dump.

  • "20190301.pam" (v0.0.2) (Size: 8.17 MiB): Wikipedia dataset for pam, parsed from 20190301 dump.

  • "20190301.pap" (v0.0.2) (Size: 1.33 MiB): Wikipedia dataset for pap, parsed from 20190301 dump.

  • "20190301.pcd" (v0.0.2) (Size: 4.14 MiB): Wikipedia dataset for pcd, parsed from 20190301 dump.

  • "20190301.pdc" (v0.0.2) (Size: 1.10 MiB): Wikipedia dataset for pdc, parsed from 20190301 dump.

  • "20190301.pfl" (v0.0.2) (Size: 3.22 MiB): Wikipedia dataset for pfl, parsed from 20190301 dump.

  • "20190301.pi" (v0.0.2) (Size: 586.77 KiB): Wikipedia dataset for pi, parsed from 20190301 dump.

  • "20190301.pih" (v0.0.2) (Size: 654.11 KiB): Wikipedia dataset for pih, parsed from 20190301 dump.

  • "20190301.pl" (v0.0.2) (Size: 1.76 GiB): Wikipedia dataset for pl, parsed from 20190301 dump.

  • "20190301.pms" (v0.0.2) (Size: 13.42 MiB): Wikipedia dataset for pms, parsed from 20190301 dump.

  • "20190301.pnb" (v0.0.2) (Size: 24.31 MiB): Wikipedia dataset for pnb, parsed from 20190301 dump.

  • "20190301.pnt" (v0.0.2) (Size: 533.84 KiB): Wikipedia dataset for pnt, parsed from 20190301 dump.

  • "20190301.ps" (v0.0.2) (Size: 14.09 MiB): Wikipedia dataset for ps, parsed from 20190301 dump.

  • "20190301.pt" (v0.0.2) (Size: 1.58 GiB): Wikipedia dataset for pt, parsed from 20190301 dump.

  • "20190301.qu" (v0.0.2) (Size: 11.42 MiB): Wikipedia dataset for qu, parsed from 20190301 dump.

  • "20190301.rm" (v0.0.2) (Size: 5.85 MiB): Wikipedia dataset for rm, parsed from 20190301 dump.

  • "20190301.rmy" (v0.0.2) (Size: 509.61 KiB): Wikipedia dataset for rmy, parsed from 20190301 dump.

  • "20190301.rn" (v0.0.2) (Size: 779.25 KiB): Wikipedia dataset for rn, parsed from 20190301 dump.

  • "20190301.ro" (v0.0.2) (Size: 449.49 MiB): Wikipedia dataset for ro, parsed from 20190301 dump.

  • "20190301.roa-rup" (v0.0.2) (Size: 931.23 KiB): Wikipedia dataset for roa-rup, parsed from 20190301 dump.

  • "20190301.roa-tara" (v0.0.2) (Size: 5.98 MiB): Wikipedia dataset for roa-tara, parsed from 20190301 dump.

  • "20190301.ru" (v0.0.2) (Size: 3.51 GiB): Wikipedia dataset for ru, parsed from 20190301 dump.

  • "20190301.rue" (v0.0.2) (Size: 4.11 MiB): Wikipedia dataset for rue, parsed from 20190301 dump.

  • "20190301.rw" (v0.0.2) (Size: 904.81 KiB): Wikipedia dataset for rw, parsed from 20190301 dump.

  • "20190301.sa" (v0.0.2) (Size: 14.29 MiB): Wikipedia dataset for sa, parsed from 20190301 dump.

  • "20190301.sah" (v0.0.2) (Size: 11.88 MiB): Wikipedia dataset for sah, parsed from 20190301 dump.

  • "20190301.sat" (v0.0.2) (Size: 2.36 MiB): Wikipedia dataset for sat, parsed from 20190301 dump.

  • "20190301.sc" (v0.0.2) (Size: 4.39 MiB): Wikipedia dataset for sc, parsed from 20190301 dump.

  • "20190301.scn" (v0.0.2) (Size: 11.83 MiB): Wikipedia dataset for scn, parsed from 20190301 dump.

  • "20190301.sco" (v0.0.2) (Size: 57.80 MiB): Wikipedia dataset for sco, parsed from 20190301 dump.

  • "20190301.sd" (v0.0.2) (Size: 12.62 MiB): Wikipedia dataset for sd, parsed from 20190301 dump.

  • "20190301.se" (v0.0.2) (Size: 3.30 MiB): Wikipedia dataset for se, parsed from 20190301 dump.

  • "20190301.sg" (v0.0.2) (Size: 286.02 KiB): Wikipedia dataset for sg, parsed from 20190301 dump.

  • "20190301.sh" (v0.0.2) (Size: 406.72 MiB): Wikipedia dataset for sh, parsed from 20190301 dump.

  • "20190301.si" (v0.0.2) (Size: 36.84 MiB): Wikipedia dataset for si, parsed from 20190301 dump.

  • "20190301.simple" (v0.0.2) (Size: 156.11 MiB): Wikipedia dataset for simple, parsed from 20190301 dump.

  • "20190301.sk" (v0.0.2) (Size: 254.37 MiB): Wikipedia dataset for sk, parsed from 20190301 dump.

  • "20190301.sl" (v0.0.2) (Size: 201.41 MiB): Wikipedia dataset for sl, parsed from 20190301 dump.

  • "20190301.sm" (v0.0.2) (Size: 678.46 KiB): Wikipedia dataset for sm, parsed from 20190301 dump.

  • "20190301.sn" (v0.0.2) (Size: 2.02 MiB): Wikipedia dataset for sn, parsed from 20190301 dump.

  • "20190301.so" (v0.0.2) (Size: 8.17 MiB): Wikipedia dataset for so, parsed from 20190301 dump.

  • "20190301.sq" (v0.0.2) (Size: 77.55 MiB): Wikipedia dataset for sq, parsed from 20190301 dump.

  • "20190301.sr" (v0.0.2) (Size: 725.30 MiB): Wikipedia dataset for sr, parsed from 20190301 dump.

  • "20190301.srn" (v0.0.2) (Size: 634.21 KiB): Wikipedia dataset for srn, parsed from 20190301 dump.

  • "20190301.ss" (v0.0.2) (Size: 737.58 KiB): Wikipedia dataset for ss, parsed from 20190301 dump.

  • "20190301.st" (v0.0.2) (Size: 482.27 KiB): Wikipedia dataset for st, parsed from 20190301 dump.

  • "20190301.stq" (v0.0.2) (Size: 3.26 MiB): Wikipedia dataset for stq, parsed from 20190301 dump.

  • "20190301.su" (v0.0.2) (Size: 20.52 MiB): Wikipedia dataset for su, parsed from 20190301 dump.

  • "20190301.sv" (v0.0.2) (Size: ?? GiB): Wikipedia dataset for sv, parsed from 20190301 dump.

  • "20190301.sw" (v0.0.2) (Size: 27.60 MiB): Wikipedia dataset for sw, parsed from 20190301 dump.

  • "20190301.szl" (v0.0.2) (Size: 4.06 MiB): Wikipedia dataset for szl, parsed from 20190301 dump.

  • "20190301.ta" (v0.0.2) (Size: 141.07 MiB): Wikipedia dataset for ta, parsed from 20190301 dump.

  • "20190301.tcy" (v0.0.2) (Size: 2.33 MiB): Wikipedia dataset for tcy, parsed from 20190301 dump.

  • "20190301.te" (v0.0.2) (Size: 113.16 MiB): Wikipedia dataset for te, parsed from 20190301 dump.

  • "20190301.tet" (v0.0.2) (Size: 1.06 MiB): Wikipedia dataset for tet, parsed from 20190301 dump.

  • "20190301.tg" (v0.0.2) (Size: 36.95 MiB): Wikipedia dataset for tg, parsed from 20190301 dump.

  • "20190301.th" (v0.0.2) (Size: 254.00 MiB): Wikipedia dataset for th, parsed from 20190301 dump.

  • "20190301.ti" (v0.0.2) (Size: 309.72 KiB): Wikipedia dataset for ti, parsed from 20190301 dump.

  • "20190301.tk" (v0.0.2) (Size: 4.50 MiB): Wikipedia dataset for tk, parsed from 20190301 dump.

  • "20190301.tl" (v0.0.2) (Size: 50.85 MiB): Wikipedia dataset for tl, parsed from 20190301 dump.

  • "20190301.tn" (v0.0.2) (Size: 1.21 MiB): Wikipedia dataset for tn, parsed from 20190301 dump.

  • "20190301.to" (v0.0.2) (Size: 775.10 KiB): Wikipedia dataset for to, parsed from 20190301 dump.

  • "20190301.tpi" (v0.0.2) (Size: 1.39 MiB): Wikipedia dataset for tpi, parsed from 20190301 dump.

  • "20190301.tr" (v0.0.2) (Size: 497.19 MiB): Wikipedia dataset for tr, parsed from 20190301 dump.

  • "20190301.ts" (v0.0.2) (Size: 1.39 MiB): Wikipedia dataset for ts, parsed from 20190301 dump.

  • "20190301.tt" (v0.0.2) (Size: 53.23 MiB): Wikipedia dataset for tt, parsed from 20190301 dump.

  • "20190301.tum" (v0.0.2) (Size: 309.58 KiB): Wikipedia dataset for tum, parsed from 20190301 dump.

  • "20190301.tw" (v0.0.2) (Size: 345.96 KiB): Wikipedia dataset for tw, parsed from 20190301 dump.

  • "20190301.ty" (v0.0.2) (Size: 485.56 KiB): Wikipedia dataset for ty, parsed from 20190301 dump.

  • "20190301.tyv" (v0.0.2) (Size: 2.60 MiB): Wikipedia dataset for tyv, parsed from 20190301 dump.

  • "20190301.udm" (v0.0.2) (Size: 2.94 MiB): Wikipedia dataset for udm, parsed from 20190301 dump.

  • "20190301.ug" (v0.0.2) (Size: 5.64 MiB): Wikipedia dataset for ug, parsed from 20190301 dump.

  • "20190301.uk" (v0.0.2) (Size: 1.28 GiB): Wikipedia dataset for uk, parsed from 20190301 dump.

  • "20190301.ur" (v0.0.2) (Size: 129.57 MiB): Wikipedia dataset for ur, parsed from 20190301 dump.

  • "20190301.uz" (v0.0.2) (Size: 60.85 MiB): Wikipedia dataset for uz, parsed from 20190301 dump.

  • "20190301.ve" (v0.0.2) (Size: 257.59 KiB): Wikipedia dataset for ve, parsed from 20190301 dump.

  • "20190301.vec" (v0.0.2) (Size: 10.65 MiB): Wikipedia dataset for vec, parsed from 20190301 dump.

  • "20190301.vep" (v0.0.2) (Size: 4.59 MiB): Wikipedia dataset for vep, parsed from 20190301 dump.

  • "20190301.vi" (v0.0.2) (Size: 623.74 MiB): Wikipedia dataset for vi, parsed from 20190301 dump.

  • "20190301.vls" (v0.0.2) (Size: 6.58 MiB): Wikipedia dataset for vls, parsed from 20190301 dump.

  • "20190301.vo" (v0.0.2) (Size: 23.80 MiB): Wikipedia dataset for vo, parsed from 20190301 dump.

  • "20190301.wa" (v0.0.2) (Size: 8.75 MiB): Wikipedia dataset for wa, parsed from 20190301 dump.

  • "20190301.war" (v0.0.2) (Size: 256.72 MiB): Wikipedia dataset for war, parsed from 20190301 dump.

  • "20190301.wo" (v0.0.2) (Size: 1.54 MiB): Wikipedia dataset for wo, parsed from 20190301 dump.

  • "20190301.wuu" (v0.0.2) (Size: 9.08 MiB): Wikipedia dataset for wuu, parsed from 20190301 dump.

  • "20190301.xal" (v0.0.2) (Size: 1.64 MiB): Wikipedia dataset for xal, parsed from 20190301 dump.

  • "20190301.xh" (v0.0.2) (Size: 1.26 MiB): Wikipedia dataset for xh, parsed from 20190301 dump.

  • "20190301.xmf" (v0.0.2) (Size: 9.40 MiB): Wikipedia dataset for xmf, parsed from 20190301 dump.

  • "20190301.yi" (v0.0.2) (Size: 11.56 MiB): Wikipedia dataset for yi, parsed from 20190301 dump.

  • "20190301.yo" (v0.0.2) (Size: 11.55 MiB): Wikipedia dataset for yo, parsed from 20190301 dump.

  • "20190301.za" (v0.0.2) (Size: 735.93 KiB): Wikipedia dataset for za, parsed from 20190301 dump.

  • "20190301.zea" (v0.0.2) (Size: 2.47 MiB): Wikipedia dataset for zea, parsed from 20190301 dump.

  • "20190301.zh" (v0.0.2) (Size: 1.71 GiB): Wikipedia dataset for zh, parsed from 20190301 dump.

  • "20190301.zh-classical" (v0.0.2) (Size: 13.37 MiB): Wikipedia dataset for zh-classical, parsed from 20190301 dump.

  • "20190301.zh-min-nan" (v0.0.2) (Size: 50.30 MiB): Wikipedia dataset for zh-min-nan, parsed from 20190301 dump.

  • "20190301.zh-yue" (v0.0.2) (Size: 52.41 MiB): Wikipedia dataset for zh-yue, parsed from 20190301 dump.

  • "20190301.zu" (v0.0.2) (Size: 1.50 MiB): Wikipedia dataset for zu, parsed from 20190301 dump.

"wikipedia/20190301.aa"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ab"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ace"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ady"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.af"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ak"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.als"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.am"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.an"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ang"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ar"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.arc"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.arz"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.as"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ast"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.atj"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.av"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ay"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.az"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.azb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ba"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bar"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bat-smg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bcl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.be"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.be-x-old"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bjn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bm"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bpy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.br"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bs"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bug"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.bxr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ca"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cbk-zam"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cdo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ce"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ceb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ch"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cho"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.chr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.chy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ckb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.co"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.crh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cs"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.csb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.cy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.da"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.de"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.din"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.diq"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.dsb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.dty"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.dv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.dz"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ee"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.el"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.eml"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.en"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.eo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.es"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.et"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.eu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ext"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fa"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ff"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fiu-vro"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fj"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.frp"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.frr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fur"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.fy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ga"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gag"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gan"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gd"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.glk"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gom"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gor"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.got"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.gv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ha"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hak"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.haw"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.he"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hif"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ho"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hsb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ht"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.hz"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ia"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.id"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ie"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ig"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ii"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ik"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ilo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.inh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.io"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.is"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.it"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.iu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ja"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.jam"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.jbo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.jv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ka"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kaa"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kab"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kbd"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kbp"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ki"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kj"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kk"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.km"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ko"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.koi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.krc"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ks"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ksh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ku"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.kw"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ky"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.la"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lad"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lbe"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lez"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lfn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.li"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lij"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lmo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ln"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lrc"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lt"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ltg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.lv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mai"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.map-bms"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mdf"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mhr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.min"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mk"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ml"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mrj"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ms"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mt"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mus"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mwl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.my"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.myv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.mzn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.na"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nah"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nap"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nds"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nds-nl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ne"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.new"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ng"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.no"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nov"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nrm"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nso"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.nv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ny"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.oc"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.olo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.om"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.or"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.os"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pa"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pag"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pam"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pap"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pcd"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pdc"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pfl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pih"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pms"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pnb"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pnt"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ps"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.pt"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.qu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.rm"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.rmy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.rn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ro"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.roa-rup"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.roa-tara"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ru"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.rue"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.rw"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sa"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sah"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sat"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sc"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.scn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sco"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sd"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.se"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.si"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.simple"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sk"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sm"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.so"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sq"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.srn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ss"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.st"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.stq"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.su"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.sw"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.szl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ta"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tcy"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.te"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tet"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tg"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.th"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ti"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tk"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tl"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tn"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.to"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tpi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tr"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ts"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tt"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tum"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tw"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ty"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.tyv"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.udm"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ug"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.uk"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ur"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.uz"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.ve"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.vec"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.vep"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.vi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.vls"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.vo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.wa"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.war"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.wo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.wuu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.xal"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.xh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.xmf"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.yi"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.yo"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.za"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.zea"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.zh"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.zh-classical"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.zh-min-nan"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.zh-yue"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

"wikipedia/20190301.zu"

FeaturesDict({
    'text': Text(shape=(), dtype=tf.string, encoder=None),
    'title': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

None computed

Urls

Supervised keys (for as_supervised=True)

None

Citation

@ONLINE {wikidump,
    author = "Wikimedia Foundation",
    title  = "Wikimedia Downloads",
    url    = "https://dumps.wikimedia.org"
}

"xnli"

XNLI is a subset of a few thousand examples from MNLI which has been translated into a 14 different languages (some low-ish resource). As with MNLI, the goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B) and is a classification task (given two sentences, predict one of three labels).

xnli is configured with tfds.text.xnli.BuilderConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.1) (Size: ?? GiB): Plain text import of XNLI

"xnli/plain_text"

FeaturesDict({
    'hypothesis': TranslationVariableLanguages({
        'language': Text(shape=(), dtype=tf.string, encoder=None),
        'translation': Text(shape=(), dtype=tf.string, encoder=None),
    }),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'premise': Translation({
        'ar': Text(shape=(), dtype=tf.string, encoder=None),
        'bg': Text(shape=(), dtype=tf.string, encoder=None),
        'de': Text(shape=(), dtype=tf.string, encoder=None),
        'el': Text(shape=(), dtype=tf.string, encoder=None),
        'en': Text(shape=(), dtype=tf.string, encoder=None),
        'es': Text(shape=(), dtype=tf.string, encoder=None),
        'fr': Text(shape=(), dtype=tf.string, encoder=None),
        'hi': Text(shape=(), dtype=tf.string, encoder=None),
        'ru': Text(shape=(), dtype=tf.string, encoder=None),
        'sw': Text(shape=(), dtype=tf.string, encoder=None),
        'th': Text(shape=(), dtype=tf.string, encoder=None),
        'tr': Text(shape=(), dtype=tf.string, encoder=None),
        'ur': Text(shape=(), dtype=tf.string, encoder=None),
        'vi': Text(shape=(), dtype=tf.string, encoder=None),
        'zh': Text(shape=(), dtype=tf.string, encoder=None),
    }),
})

Statistics

None computed

Urls

Supervised keys (for as_supervised=True)

None

Citation

@InProceedings{conneau2018xnli,
  author = "Conneau, Alexis
                 and Rinott, Ruty
                 and Lample, Guillaume
                 and Williams, Adina
                 and Bowman, Samuel R.
                 and Schwenk, Holger
                 and Stoyanov, Veselin",
  title = "XNLI: Evaluating Cross-lingual Sentence Representations",
  booktitle = "Proceedings of the 2018 Conference on Empirical Methods
               in Natural Language Processing",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  location = "Brussels, Belgium",
}

translate

"flores"

Evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English.

flores is configured with tfds.translate.flores.FloresConfig and has the following configurations predefined (defaults to the first one):

  • "neen_plain_text" (v0.0.3) (Size: 984.65 KiB): Translation dataset from ne to en, uses encoder plain_text.

  • "sien_plain_text" (v0.0.3) (Size: 984.65 KiB): Translation dataset from si to en, uses encoder plain_text.

"flores/neen_plain_text"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ne': Text(shape=(), dtype=tf.string, encoder=None),
})

"flores/sien_plain_text"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'si': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 5,664
VALIDATION 2,898
TEST 2,766

Urls

Supervised keys (for as_supervised=True)

(u'si', u'en')

Citation

@misc{guzmn2019new,
    title={Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English},
    author={Francisco Guzman and Peng-Jen Chen and Myle Ott and Juan Pino and Guillaume Lample and Philipp Koehn and Vishrav Chaudhary and Marc'Aurelio Ranzato},
    year={2019},
    eprint={1902.01382},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

"ted_hrlr_translate"

Data sets derived from TED talk transcripts for comparing similar language pairs where one is high resource and the other is low resource.

ted_hrlr_translate is configured with tfds.translate.ted_hrlr.TedHrlrConfig and has the following configurations predefined (defaults to the first one):

  • "az_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from az to en in plain text.

  • "aztr_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from az_tr to en in plain text.

  • "be_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from be to en in plain text.

  • "beru_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from be_ru to en in plain text.

  • "es_to_pt" (v0.0.1) (Size: 124.94 MiB): Translation dataset from es to pt in plain text.

  • "fr_to_pt" (v0.0.1) (Size: 124.94 MiB): Translation dataset from fr to pt in plain text.

  • "gl_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from gl to en in plain text.

  • "glpt_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from gl_pt to en in plain text.

  • "he_to_pt" (v0.0.1) (Size: 124.94 MiB): Translation dataset from he to pt in plain text.

  • "it_to_pt" (v0.0.1) (Size: 124.94 MiB): Translation dataset from it to pt in plain text.

  • "pt_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from pt to en in plain text.

  • "ru_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from ru to en in plain text.

  • "ru_to_pt" (v0.0.1) (Size: 124.94 MiB): Translation dataset from ru to pt in plain text.

  • "tr_to_en" (v0.0.1) (Size: 124.94 MiB): Translation dataset from tr to en in plain text.

"ted_hrlr_translate/az_to_en"

Translation({
    'az': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/aztr_to_en"

Translation({
    'az_tr': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/be_to_en"

Translation({
    'be': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/beru_to_en"

Translation({
    'be_ru': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/es_to_pt"

Translation({
    'es': Text(shape=(), dtype=tf.string, encoder=None),
    'pt': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/fr_to_pt"

Translation({
    'fr': Text(shape=(), dtype=tf.string, encoder=None),
    'pt': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/gl_to_en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'gl': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/glpt_to_en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'gl_pt': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/he_to_pt"

Translation({
    'he': Text(shape=(), dtype=tf.string, encoder=None),
    'pt': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/it_to_pt"

Translation({
    'it': Text(shape=(), dtype=tf.string, encoder=None),
    'pt': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/pt_to_en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'pt': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/ru_to_en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/ru_to_pt"

Translation({
    'pt': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"ted_hrlr_translate/tr_to_en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'tr': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 191,524
TRAIN 182,450
TEST 5,029
VALIDATION 4,045

Urls

Supervised keys (for as_supervised=True)

(u'tr', u'en')

Citation

@inproceedings{Ye2018WordEmbeddings,
  author  = {Ye, Qi and Devendra, Sachan and Matthieu, Felix and Sarguna, Padmanabhan and Graham, Neubig},
  title   = {When and Why are pre-trained word embeddings useful for Neural Machine Translation},
  booktitle = {HLT-NAACL},
  year    = {2018},
  }

"ted_multi_translate"

Massively multilingual (60 language) data set derived from TED Talk transcripts. Each record consists of parallel arrays of language and text. Missing and incomplete translations will be filtered out.

ted_multi_translate is configured with tfds.translate.ted_multi.BuilderConfig and has the following configurations predefined (defaults to the first one):

  • "plain_text" (v0.0.3) (Size: 335.91 MiB): Plain text import of multilingual TED talk translations

"ted_multi_translate/plain_text"

FeaturesDict({
    'talk_name': Text(shape=(), dtype=tf.string, encoder=None),
    'translations': TranslationVariableLanguages({
        'language': Text(shape=(), dtype=tf.string, encoder=None),
        'translation': Text(shape=(), dtype=tf.string, encoder=None),
    }),
})

Statistics

Split Examples
ALL 271,360
TRAIN 258,098
TEST 7,213
VALIDATION 6,049

Urls

Supervised keys (for as_supervised=True)

None

Citation

@InProceedings{qi-EtAl:2018:N18-2,
  author    = {Qi, Ye  and  Sachan, Devendra  and  Felix, Matthieu  and  Padmanabhan, Sarguna  and  Neubig, Graham},
  title     = {When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?},
  booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)},
  month     = {June},
  year      = {2018},
  address   = {New Orleans, Louisiana},
  publisher = {Association for Computational Linguistics},
  pages     = {529--535},
  abstract  = {The performance of Neural Machine Translation (NMT) systems often suffers in low-resource scenarios where sufficiently large-scale parallel corpora cannot be obtained. Pre-trained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In this work, we perform five sets of experiments that analyze when we can expect pre-trained word embeddings to help in NMT tasks. We show that such embeddings can be surprisingly effective in some cases -- providing gains of up to 20 BLEU points in the most favorable setting.},
  url       = {http://www.aclweb.org/anthology/N18-2084}
}

"wmt15_translate"

Translate dataset based on the data from statmt.org.

wmt15_translate is configured with tfds.translate.wmt15.WmtConfig and has the following configurations predefined (defaults to the first one):

  • "cs-en" (v0.0.2) (Size: 1.62 GiB): WMT 2015 translation task dataset.

  • "de-en" (v0.0.2) (Size: 1.62 GiB): WMT 2015 translation task dataset.

  • "fi-en" (v0.0.2) (Size: 260.51 MiB): WMT 2015 translation task dataset.

  • "fr-en" (v0.0.2) (Size: 6.24 GiB): WMT 2015 translation task dataset.

  • "ru-en" (v0.0.2) (Size: 1.02 GiB): WMT 2015 translation task dataset.

  • "cs-en.subwords8k" (v0.0.1) (Size: ?? GiB): WMT 2015 translation dataset with subword encoding.

  • "de-en.subwords8k" (v0.0.1) (Size: 1.62 GiB): WMT 2015 translation dataset with subword encoding.

  • "fi-en.subwords8k" (v0.0.1) (Size: 260.51 MiB): WMT 2015 translation dataset with subword encoding.

  • "fr-en.subwords8k" (v0.0.1) (Size: 6.24 GiB): WMT 2015 translation dataset with subword encoding.

  • "ru-en.subwords8k" (v0.0.1) (Size: 1.02 GiB): WMT 2015 translation dataset with subword encoding.

"wmt15_translate/cs-en"

Translation({
    'cs': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt15_translate/de-en"

Translation({
    'de': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt15_translate/fi-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'fi': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt15_translate/fr-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'fr': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt15_translate/ru-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt15_translate/cs-en.subwords8k"

Translation({
    'cs': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt15_translate/de-en.subwords8k"

Translation({
    'de': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8270>),
    'en': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8212>),
})

"wmt15_translate/fi-en.subwords8k"

Translation({
    'en': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8217>),
    'fi': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8113>),
})

"wmt15_translate/fr-en.subwords8k"

Translation({
    'en': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8183>),
    'fr': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8133>),
})

"wmt15_translate/ru-en.subwords8k"

Translation({
    'en': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8194>),
    'ru': Text(shape=(None,), dtype=tf.int64, encoder=<SubwordTextEncoder vocab_size=8180>),
})

Statistics

Split Examples
ALL 2,506,905
TRAIN 2,495,081
VALIDATION 9,006
TEST 2,818

Urls

Supervised keys (for as_supervised=True)

(u'ru', u'en')

Citation

@InProceedings{bojar-EtAl:2015:WMT,
  author    = {Bojar, Ond
{r}ej  and  Chatterjee, Rajen  and  Federmann, Christian  and  Haddow, Barry  and  Huck, Matthias  and  Hokamp, Chris  and  Koehn, Philipp  and  Logacheva, Varvara  and  Monz, Christof  and  Negri, Matteo  and  Post, Matt  and  Scarton, Carolina  and  Specia, Lucia  and  Turchi, Marco},
  title     = {Findings of the 2015 Workshop on Statistical Machine Translation},
  booktitle = {Proceedings of the Tenth Workshop on Statistical Machine Translation},
  month     = {September},
  year      = {2015},
  address   = {Lisbon, Portugal},
  publisher = {Association for Computational Linguistics},
  pages     = {1--46},
  url       = {http://aclweb.org/anthology/W15-3001}
}

"wmt16_translate"

Translate dataset based on the data from statmt.org.

wmt16_translate is configured with tfds.translate.wmt16.WmtConfig and has the following configurations predefined (defaults to the first one):

  • "cs-en" (v0.0.1) (Size: 1.57 GiB): WMT 2016 translation task dataset.

  • "de-en" (v0.0.1) (Size: 1.57 GiB): WMT 2016 translation task dataset.

  • "fi-en" (v0.0.1) (Size: 260.51 MiB): WMT 2016 translation task dataset.

  • "ro-en" (v0.0.1) (Size: 273.83 MiB): WMT 2016 translation task dataset.

  • "ru-en" (v0.0.1) (Size: 993.38 MiB): WMT 2016 translation task dataset.

  • "tr-en" (v0.0.1) (Size: 59.32 MiB): WMT 2016 translation task dataset.

"wmt16_translate/cs-en"

Translation({
    'cs': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt16_translate/de-en"

Translation({
    'de': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt16_translate/fi-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'fi': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt16_translate/ro-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ro': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt16_translate/ru-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt16_translate/tr-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'tr': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 209,757
TRAIN 205,756
TEST 3,000
VALIDATION 1,001

Urls

Supervised keys (for as_supervised=True)

(u'tr', u'en')

Citation

@InProceedings{bojar-EtAl:2016:WMT1,
  author    = {Bojar, Ond
{r}ej  and  Chatterjee, Rajen  and  Federmann, Christian  and  Graham, Yvette  and  Haddow, Barry  and  Huck, Matthias  and  Jimeno Yepes, Antonio  and  Koehn, Philipp  and  Logacheva, Varvara  and  Monz, Christof  and  Negri, Matteo  and  Neveol, Aurelie  and  Neves, Mariana  and  Popel, Martin  and  Post, Matt  and  Rubino, Raphael  and  Scarton, Carolina  and  Specia, Lucia  and  Turchi, Marco  and  Verspoor, Karin  and  Zampieri, Marcos},
  title     = {Findings of the 2016 Conference on Machine Translation},
  booktitle = {Proceedings of the First Conference on Machine Translation},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {131--198},
  url       = {http://www.aclweb.org/anthology/W/W16/W16-2301}
}

"wmt17_translate"

Translate dataset based on the data from statmt.org.

wmt17_translate is configured with tfds.translate.wmt17.WmtConfig and has the following configurations predefined (defaults to the first one):

  • "cs-en" (v0.0.2) (Size: 1.66 GiB): WMT 2017 translation task dataset.

  • "de-en" (v0.0.2) (Size: 1.81 GiB): WMT 2017 translation task dataset.

  • "fi-en" (v0.0.2) (Size: 414.10 MiB): WMT 2017 translation task dataset.

  • "lv-en" (v0.0.2) (Size: 161.69 MiB): WMT 2017 translation task dataset.

  • "ru-en" (v0.0.2) (Size: 3.34 GiB): WMT 2017 translation task dataset.

  • "tr-en" (v0.0.2) (Size: 59.32 MiB): WMT 2017 translation task dataset.

  • "zh-en" (v0.0.2) (Size: 2.16 GiB): WMT 2017 translation task dataset.

"wmt17_translate/cs-en"

Translation({
    'cs': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt17_translate/de-en"

Translation({
    'de': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt17_translate/fi-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'fi': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt17_translate/lv-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'lv': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt17_translate/ru-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt17_translate/tr-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'tr': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt17_translate/zh-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'zh': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 25,140,612
TRAIN 25,136,609
VALIDATION 2,002
TEST 2,001

Urls

Supervised keys (for as_supervised=True)

(u'zh', u'en')

Citation

@InProceedings{bojar-EtAl:2017:WMT1,
  author    = {Bojar, Ond
{r}ej  and  Chatterjee, Rajen  and  Federmann, Christian  and  Graham, Yvette  and  Haddow, Barry  and  Huang, Shujian  and  Huck, Matthias  and  Koehn, Philipp  and  Liu, Qun  and  Logacheva, Varvara  and  Monz, Christof  and  Negri, Matteo  and  Post, Matt  and  Rubino, Raphael  and  Specia, Lucia  and  Turchi, Marco},
  title     = {Findings of the 2017 Conference on Machine Translation (WMT17)},
  booktitle = {Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers},
  month     = {September},
  year      = {2017},
  address   = {Copenhagen, Denmark},
  publisher = {Association for Computational Linguistics},
  pages     = {169--214},
  url       = {http://www.aclweb.org/anthology/W17-4717}
}

"wmt18_translate"

Translate dataset based on the data from statmt.org.

wmt18_translate is configured with tfds.translate.wmt18.WmtConfig and has the following configurations predefined (defaults to the first one):

  • "cs-en" (v0.0.2) (Size: 1.89 GiB): WMT 2018 translation task dataset.

  • "de-en" (v0.0.2) (Size: 3.55 GiB): WMT 2018 translation task dataset.

  • "et-en" (v0.0.2) (Size: 499.91 MiB): WMT 2018 translation task dataset.

  • "fi-en" (v0.0.2) (Size: 468.76 MiB): WMT 2018 translation task dataset.

  • "kk-en" (v0.0.2) (Size: ?? GiB): WMT 2018 translation task dataset.

  • "ru-en" (v0.0.2) (Size: 3.91 GiB): WMT 2018 translation task dataset.

  • "tr-en" (v0.0.2) (Size: 59.32 MiB): WMT 2018 translation task dataset.

  • "zh-en" (v0.0.2) (Size: 2.10 GiB): WMT 2018 translation task dataset.

"wmt18_translate/cs-en"

Translation({
    'cs': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/de-en"

Translation({
    'de': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/et-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'et': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/fi-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'fi': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/kk-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'kk': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/ru-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/tr-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'tr': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt18_translate/zh-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'zh': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 25,170,193
TRAIN 25,162,209
VALIDATION 4,003
TEST 3,981

Urls

Supervised keys (for as_supervised=True)

(u'zh', u'en')

Citation

@InProceedings{bojar-EtAl:2018:WMT1,
  author    = {Bojar, Ond
{r}ej  and  Federmann, Christian  and  Fishel, Mark
    and Graham, Yvette  and  Haddow, Barry  and  Huck, Matthias  and
    Koehn, Philipp  and  Monz, Christof},
  title     = {Findings of the 2018 Conference on Machine Translation (WMT18)},
  booktitle = {Proceedings of the Third Conference on Machine Translation,
    Volume 2: Shared Task Papers},
  month     = {October},
  year      = {2018},
  address   = {Belgium, Brussels},
  publisher = {Association for Computational Linguistics},
  pages     = {272--307},
  url       = {http://www.aclweb.org/anthology/W18-6401}
}

"wmt19_translate"

Translate dataset based on the data from statmt.org.

wmt19_translate is configured with tfds.translate.wmt19.WmtConfig and has the following configurations predefined (defaults to the first one):

  • "cs-en" (v0.0.2) (Size: 1.88 GiB): WMT 2019 translation task dataset.

  • "de-en" (v0.0.2) (Size: 9.71 GiB): WMT 2019 translation task dataset.

  • "fi-en" (v0.0.2) (Size: 959.46 MiB): WMT 2019 translation task dataset.

  • "gu-en" (v0.0.2) (Size: 37.03 MiB): WMT 2019 translation task dataset.

  • "kk-en" (v0.0.2) (Size: 39.58 MiB): WMT 2019 translation task dataset.

  • "lt-en" (v0.0.2) (Size: 392.20 MiB): WMT 2019 translation task dataset.

  • "ru-en" (v0.0.2) (Size: 3.86 GiB): WMT 2019 translation task dataset.

  • "zh-en" (v0.0.2) (Size: 2.04 GiB): WMT 2019 translation task dataset.

  • "fr-de" (v0.0.2) (Size: 722.20 MiB): WMT 2019 translation task dataset.

"wmt19_translate/cs-en"

Translation({
    'cs': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/de-en"

Translation({
    'de': Text(shape=(), dtype=tf.string, encoder=None),
    'en': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/fi-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'fi': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/gu-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'gu': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/kk-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'kk': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/lt-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'lt': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/ru-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'ru': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/zh-en"

Translation({
    'en': Text(shape=(), dtype=tf.string, encoder=None),
    'zh': Text(shape=(), dtype=tf.string, encoder=None),
})

"wmt19_translate/fr-de"

Translation({
    'de': Text(shape=(), dtype=tf.string, encoder=None),
    'fr': Text(shape=(), dtype=tf.string, encoder=None),
})

Statistics

Split Examples
ALL 9,825,988
TRAIN 9,824,476
VALIDATION 1,512

Urls

Supervised keys (for as_supervised=True)

(u'fr', u'de')

Citation

@ONLINE {wmt19translate,
    author = "Wikimedia Foundation",
    title  = "ACL 2019 Fourth Conference on Machine Translation (WMT19), Shared Task: Machine Translation of News",
    url    = "http://www.statmt.org/wmt19/translation-task.html"
}

video

"bair_robot_pushing_small"

This data set contains roughly 44,000 examples of robot pushing motions, including one training set (train) and two test sets of previously seen (testseen) and unseen (testnovel) objects. This is the small 64x64 version.

Features

SequenceDict({
    'action': Tensor(shape=(4,), dtype=tf.float32),
    'endeffector_pos': Tensor(shape=(3,), dtype=tf.float32),
    'image_aux1': Image(shape=(64, 64, 3), dtype=tf.uint8),
    'image_main': Image(shape=(64, 64, 3), dtype=tf.uint8),
})

Statistics

Split Examples
ALL 43,520
TRAIN 43,264
TEST 256

Urls

Supervised keys (for as_supervised=True)

None

Citation

@inproceedings{conf/nips/FinnGL16,
  added-at = {2016-12-16T00:00:00.000+0100},
  author = {Finn, Chelsea and Goodfellow, Ian J. and Levine, Sergey},
  biburl = {https://www.bibsonomy.org/bibtex/230073873b4fe43b314724b772d0f9256/dblp},
  booktitle = {NIPS},
  crossref = {conf/nips/2016},
  editor = {Lee, Daniel D. and Sugiyama, Masashi and Luxburg, Ulrike V. and Guyon, Isabelle and Garnett, Roman},
  ee = {http://papers.nips.cc/paper/6161-unsupervised-learning-for-physical-interaction-through-video-prediction},
  interhash = {2e6b416723704f4aa5ad0686ce5a3593},
  intrahash = {30073873b4fe43b314724b772d0f9256},
  keywords = {dblp},
  pages = {64-72},
  timestamp = {2016-12-17T11:33:40.000+0100},
  title = {Unsupervised Learning for Physical Interaction through Video Prediction.},
  url = {http://dblp.uni-trier.de/db/conf/nips/nips2016.html#FinnGL16},
  year = 2016
}

"moving_mnist"

Moving variant of MNIST database of handwritten digits. This is the data used by the authors for reporting model performance. See tfds.video.moving_mnist.image_as_moving_sequence for generating training/validation data from the MNIST dataset.

Features

FeaturesDict({
    'image_sequence': Video(shape=(20, 64, 64, 1), dtype=tf.uint8, feature=Image(shape=(64, 64, 1), dtype=tf.uint8)),
})

Statistics

Split Examples
TEST 10,000
ALL 10,000

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{DBLP:journals/corr/SrivastavaMS15,
  author    = {Nitish Srivastava and
               Elman Mansimov and
               Ruslan Salakhutdinov},
  title     = {Unsupervised Learning of Video Representations using LSTMs},
  journal   = {CoRR},
  volume    = {abs/1502.04681},
  year      = {2015},
  url       = {http://arxiv.org/abs/1502.04681},
  archivePrefix = {arXiv},
  eprint    = {1502.04681},
  timestamp = {Mon, 13 Aug 2018 16:47:05 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/SrivastavaMS15},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"starcraft_video"

This data set contains videos generated from Starcraft.

starcraft_video is configured with tfds.video.starcraft.StarcraftVideoConfig and has the following configurations predefined (defaults to the first one):

  • "brawl_64" (v0.1.2) (Size: 6.40 GiB): Brawl map with 64x64 resolution.

  • "brawl_128" (v0.1.2) (Size: 20.76 GiB): Brawl map with 128x128 resolution.

  • "collect_mineral_shards_64" (v0.1.2) (Size: 7.83 GiB): CollectMineralShards map with 64x64 resolution.

  • "collect_mineral_shards_128" (v0.1.2) (Size: 24.83 GiB): CollectMineralShards map with 128x128 resolution.

  • "move_unit_to_border_64" (v0.1.2) (Size: 1.77 GiB): MoveUnitToBorder map with 64x64 resolution.

  • "move_unit_to_border_128" (v0.1.2) (Size: 5.75 GiB): MoveUnitToBorder map with 128x128 resolution.

  • "road_trip_with_medivac_64" (v0.1.2) (Size: 2.48 GiB): RoadTripWithMedivac map with 64x64 resolution.

  • "road_trip_with_medivac_128" (v0.1.2) (Size: 7.80 GiB): RoadTripWithMedivac map with 128x128 resolution.

"starcraft_video/brawl_64"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 64, 64, 3), dtype=tf.uint8, feature=Image(shape=(64, 64, 3), dtype=tf.uint8)),
})

"starcraft_video/brawl_128"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 128, 128, 3), dtype=tf.uint8, feature=Image(shape=(128, 128, 3), dtype=tf.uint8)),
})

"starcraft_video/collect_mineral_shards_64"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 64, 64, 3), dtype=tf.uint8, feature=Image(shape=(64, 64, 3), dtype=tf.uint8)),
})

"starcraft_video/collect_mineral_shards_128"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 128, 128, 3), dtype=tf.uint8, feature=Image(shape=(128, 128, 3), dtype=tf.uint8)),
})

"starcraft_video/move_unit_to_border_64"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 64, 64, 3), dtype=tf.uint8, feature=Image(shape=(64, 64, 3), dtype=tf.uint8)),
})

"starcraft_video/move_unit_to_border_128"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 128, 128, 3), dtype=tf.uint8, feature=Image(shape=(128, 128, 3), dtype=tf.uint8)),
})

"starcraft_video/road_trip_with_medivac_64"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 64, 64, 3), dtype=tf.uint8, feature=Image(shape=(64, 64, 3), dtype=tf.uint8)),
})

"starcraft_video/road_trip_with_medivac_128"

FeaturesDict({
    'rgb_screen': Video(shape=(None, 128, 128, 3), dtype=tf.uint8, feature=Image(shape=(128, 128, 3), dtype=tf.uint8)),
})

Statistics

Split Examples
ALL 14,000
TRAIN 10,000
VALIDATION 2,000
TEST 2,000

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{DBLP:journals/corr/abs-1812-01717,
  author    = {Thomas Unterthiner and
               Sjoerd van Steenkiste and
               Karol Kurach and
               Rapha{"{e}}l Marinier and
               Marcin Michalski and
               Sylvain Gelly},
  title     = {Towards Accurate Generative Models of Video: {A} New Metric and
               Challenges},
  journal   = {CoRR},
  volume    = {abs/1812.01717},
  year      = {2018},
  url       = {http://arxiv.org/abs/1812.01717},
  archivePrefix = {arXiv},
  eprint    = {1812.01717},
  timestamp = {Tue, 01 Jan 2019 15:01:25 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1812-01717},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

"ucf101"

A 101-label video classification dataset.

ucf101 is configured with tfds.video.ucf101.Ucf101Config and has the following configurations predefined (defaults to the first one):

  • "ucf101_1_256" (v1.0.0) (Size: ?? GiB): 256x256 UCF with the first action recognition split.

"ucf101/ucf101_1_256"

FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=101),
    'video': Video(shape=(None, 256, 256, 3), dtype=tf.uint8, feature=Image(shape=(256, 256, 3), dtype=tf.uint8)),
})

Statistics

None computed

Urls

Supervised keys (for as_supervised=True)

None

Citation

@article{DBLP:journals/corr/abs-1212-0402,
  author    = {Khurram Soomro and
               Amir Roshan Zamir and
               Mubarak Shah},
  title     = { {UCF101:} {A} Dataset of 101 Human Actions Classes From Videos in
               The Wild},
  journal   = {CoRR},
  volume    = {abs/1212.0402},
  year      = {2012},
  url       = {http://arxiv.org/abs/1212.0402},
  archivePrefix = {arXiv},
  eprint    = {1212.0402},
  timestamp = {Mon, 13 Aug 2018 16:47:45 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1212-0402},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}