tff.simulation.TransformingClientData

Transforms client data, potentially expanding by adding pseudo-clients.

Inherits From: ClientData

Each client of the raw_client_data is "expanded" into some number of pseudo-clients. Each client ID is a string consisting of the original client ID plus a concatenated integer index. For example, the raw client id "client_a" might be expanded into pseudo-client ids "client_a_0", "client_a_1" and "client_a_2". A function fn(x) maps datapoint x to a new datapoint, where the constructor of fn is parameterized by the (raw) client_id and index i. For example if x is an image, then make_transform_fn("client_a", 0)(x) might be the identity, while make_transform_fn("client_a", 1)(x) could be a random rotation of the image with the angle determined by a hash of "client_a" and "1". Typically by convention the index 0 corresponds to the identity function if the identity is supported.

raw_client_data A ClientData to expand.
make_transform_fn A function that returns a callable that maps datapoint x to a new datapoint x'. make_transform_fn will be called as make_transform_fn(raw_client_id, i) where i is an integer index, and should return a function fn(x)->x. For example if x is an image, then make_transform_fn("client_a", 0)(x) might be the identity, while make_transform_fn("client_a", 1)(x) could be a random rotation of the image with the angle determined by a hash of "client_a" and "1". If transform_fn_cons returns None, no transformation is performed. Typically by convention the index 0 corresponds to the identity function if the identity is supported.
num_transformed_clients The total number of transformed clients to produce. If it is an integer multiple k of the number of real clients, there will be exactly k pseudo-clients per real client, with indices 0...k-1. Any remainder g will be generated from the first g real clients and will be given index k.

client_ids A list of string identifiers for clients in this dataset.
dataset_computation A tff.Computation accepting a client ID, returning a dataset.

ClientData implementations that don't support dataset_computation should raise NotImplementedError if this attribute is accessed.

element_type_structure The element type information of the client datasets.

elements returned by datasets in this ClientData object.

Methods

create_tf_dataset_for_client

View source

Creates a new tf.data.Dataset containing the client training examples.

Args
client_id The string client_id for the desired client.

Returns
A tf.data.Dataset object.

create_tf_dataset_from_all_clients

View source

Creates a new tf.data.Dataset containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

Args
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

Returns
A tf.data.Dataset object.

datasets

View source

Yields the tf.data.Dataset for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

Args
limit_count Optional, a maximum number of datasets to return.
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

from_clients_and_fn

View source

Constructs a ClientData based on the given function.

Args
client_ids A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
create_tf_dataset_for_client_fn A function that takes a client_id from the above list, and returns a tf.data.Dataset.

Returns
A ClientData.

preprocess

View source

Applies preprocess_fn to each client's data.

train_test_client_split

View source

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

Args
client_data The base ClientData to split.
num_test_clients How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.

Returns
A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

Raises
ValueError If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.