|View source on GitHub|
Transforms client data, potentially expanding by adding pseudo-clients.
Each client of the raw_client_data is "expanded" into some number of pseudo-clients. Each client ID is a string consisting of the original client ID plus a concatenated integer index. For example, the raw client id "client_a" might be expanded into pseudo-client ids "client_a_0", "client_a_1" and "client_a_2". A function fn(x) maps datapoint x to a new datapoint, where the constructor of fn is parameterized by the (raw) client_id and index i. For example if x is an image, then make_transform_fn("client_a", 0)(x) might be the identity, while make_transform_fn("client_a", 1)(x) could be a random rotation of the image with the angle determined by a hash of "client_a" and "1". Typically by convention the index 0 corresponds to the identity function if the identity is supported.
__init__( raw_client_data, make_transform_fn, num_transformed_clients )
Initializes the TransformingClientData.
raw_client_data: A ClientData to expand.
make_transform_fn: A function that returns a callable that maps datapoint x to a new datapoint x'. make_transform_fn will be called as make_transform_fn(raw_client_id, i) where i is an integer index, and should return a function fn(x)->x. For example if x is an image, then make_transform_fn("client_a", 0)(x) might be the identity, while make_transform_fn("client_a", 1)(x) could be a random rotation of the image with the angle determined by a hash of "client_a" and "1". If transform_fn_cons returns
None, no transformation is performed. Typically by convention the index 0 corresponds to the identity function if the identity is supported.
num_transformed_clients: The total number of transformed clients to produce. If it is an integer multiple k of the number of real clients, there will be exactly k pseudo-clients per real client, with indices 0...k-1. Any remainder g will be generated from the first g real clients and will be given index k.
The list of string identifiers for clients in this dataset.
The element type information of the client datasets.
A nested structure of
tf.TensorSpec objects defining the type of the
elements returned by datasets in this
Creates a new
tf.data.Dataset containing the client training examples.
client_id: The string client_id for the desired client.
Creates a new
tf.data.Dataset containing all client examples.
NOTE: the returned
tf.data.Dataset is not serializable and runnable on
other devices, as it uses
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
seed: Optional, a seed to determine the order in which clients are processed in the joined dataset.
from_clients_and_fn( cls, client_ids, create_tf_dataset_for_client_fn )
ClientData based on the given function.
client_ids: A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
create_tf_dataset_for_client_fn: A function that takes a client_id from the above list, and returns a
preprocess_fn to each client's data.
train_test_client_split( cls, client_data, num_test_clients )
Returns a pair of (train, test)
This method partitions the clients of
client_data into two
objects with disjoint sets of
ClientData.client_ids. All clients in the
ClientData are guaranteed to have non-empty datasets, but the
ClientData may have clients with no data.
client_data: The base
num_test_clients: How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty
A pair (train_client_data, test_client_data), where test_client_data
num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
num_test_clientscannot be satistifed by
client_data, or too many clients have empty datasets.