|View source on GitHub|
Loads a federated version of the CIFAR-100 dataset.
tff.simulation.datasets.cifar100.load_data( cache_dir=None )
The dataset is downloaded and cached locally. If previously downloaded, it tries to load the dataset from cache.
The dataset is derived from the CIFAR-100 dataset. The training and testing examples are partitioned across 500 and 100 clients (respectively). No clients share any data samples, so it is a true partition of CIFAR-100. The train clients have string client IDs in the range [0-499], while the test clients have string client IDs in the range [0-99]. The train clients form a true partition of the CIFAR-100 training split, while the test clients form a true partition of the CIFAR-100 testing split.
The data partitioning is done using a hierarchical Latent Dirichlet Allocation (LDA) process, referred to as the Pachinko Allocation Method (PAM). This method uses a two-stage LDA process, where each client has an associated multinomial distribution over the coarse labels of CIFAR-100, and a coarse-to-fine label multinomial distribution for that coarse label over the labels under that coarse label. The coarse label multinomial is drawn from a symmetric Dirichlet with parameter 0.1, and each coarse-to-fine multinomial distribution is drawn from a symmetric Dirichlet with parameter 10. Each client has 100 samples. To generate a sample for the client, we first select a coarse label by drawing from the coarse label multinomial distribution, and then draw a fine label using the coarse-to-fine multinomial distribution. We then randomly draw a sample from CIFAR-100 with that label (without replacement). If this exhausts the set of samples with this label, we remove the label from the coarse-to-fine multinomial and renormalize the multinomial distribution.
Data set sizes:
- train: 500,000 examples
- test: 100,000 examples
tf.data.Datasets returned by
tff.simulation.ClientData.create_tf_dataset_for_client will yield
collections.OrderedDict objects at each iteration, with the following keys
dtype=tf.int64and shape  that corresponds to the coarse label of the associated image. Labels are in the range [0-19].
dtype=tf.uint8and shape [32, 32, 3], corresponding to the pixels of the handwritten digit, with values in the range [0, 255].
dtype=tf.int64and shape , the class label of the corresponding image. Labels are in the range [0-99].
cache_dir: (Optional) directory to cache the downloaded file. If
None, caches in Keras' default cache directory.
Tuple of (train, test) where the tuple elements are