|View source on GitHub|
Loads a federated version of the Google Landmark v2 dataset.
tff.simulation.datasets.gldv2.load_data( num_worker: int = 1, cache_dir: str = 'cache', gld23k: bool = False, base_url: str = GLD_SHARD_BASE_URL )
The dataset consists of photos of various world landmarks, with images grouped by photographer to achieve a federated partitioning of the data. The dataset is downloaded and cached locally. If previously downloaded, it tries to load the dataset from cache.
tf.data.Datasets returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client will yield
collections.OrderedDict objects at each iteration, with the following keys
dtype=tf.uint8that corresponds to the pixels of the landmark images.
dtype=tf.int64and shape , corresponding to the class label of the landmark ([0, 203) for gld23k, [0, 2028) for gld160k).
Two flavors of GLD datasets are available. When gld23k is true, a minimum version of the federated Google landmark dataset will be provided for faster iterations. The gld23k dataset contains 203 classes, 233 clients and 23080 images. When gld23k is false, the gld160k dataset (https://arxiv.org/abs/2003.08082) will be provided. The gld160k dataset contains 2,028 classes, 1262 clients and 164,172 images.
||(Optional) The number of threads for downloading the GLD v2 dataset.|
(Optional) The directory to cache the downloaded file. If
||(Optional) When true, a smaller version of the federated Google Landmark v2 dataset will be loaded. This gld23k dataset is used for faster prototyping.|
||(Optional) The base url to download GLD v2 image shards.|
Tuple of (train, test) where the tuple elements are