tff.simulation.datasets.gldv2.load_data

Loads a federated version of the Google Landmark v2 dataset.

The dataset consists of photos of various world landmarks, with images grouped by photographer to achieve a federated partitioning of the data. The dataset is downloaded and cached locally. If previously downloaded, it tries to load the dataset from cache.

The tf.data.Datasets returned by tff.simulation.datasets.ClientData.create_tf_dataset_for_client will yield collections.OrderedDict objects at each iteration, with the following keys and values:

  • 'image/decoded': A tf.Tensor with dtype=tf.uint8 that corresponds to the pixels of the landmark images.
  • 'class': A tf.Tensor with dtype=tf.int64 and shape [1], corresponding to the class label of the landmark ([0, 203) for gld23k, [0, 2028) for gld160k).

Two flavors of GLD datasets are available. When gld23k is true, a minimum version of the federated Google landmark dataset will be provided for faster iterations. The gld23k dataset contains 203 classes, 233 clients and 23080 images. When gld23k is false, the gld160k dataset (https://arxiv.org/abs/2003.08082) will be provided. The gld160k dataset contains 2,028 classes, 1262 clients and 164,172 images.

num_worker (Optional) The number of threads for downloading the GLD v2 dataset.
cache_dir (Optional) The directory to cache the downloaded file. If None, caches in Keras' default cache directory.
gld23k (Optional) When true, a smaller version of the federated Google Landmark v2 dataset will be loaded. This gld23k dataset is used for faster prototyping.
base_url (Optional) The base url to download GLD v2 image shards.

Tuple of (train, test) where the tuple elements are a tff.simulation.datasets.ClientData and a tf.data.Dataset.