![]() |
Transforms client data, potentially expanding by adding pseudo-clients.
Inherits From: ClientData
tff.simulation.TransformingClientData(
raw_client_data, make_transform_fn, num_transformed_clients
)
Each client of the raw_client_data is "expanded" into some number of pseudo-clients. Each client ID is a string consisting of the original client ID plus a concatenated integer index. For example, the raw client id "client_a" might be expanded into pseudo-client ids "client_a_0", "client_a_1" and "client_a_2". A function fn(x) maps datapoint x to a new datapoint, where the constructor of fn is parameterized by the (raw) client_id and index i. For example if x is an image, then make_transform_fn("client_a", 0)(x) might be the identity, while make_transform_fn("client_a", 1)(x) could be a random rotation of the image with the angle determined by a hash of "client_a" and "1". Typically by convention the index 0 corresponds to the identity function if the identity is supported.
Args | |
---|---|
raw_client_data
|
A ClientData to expand. |
make_transform_fn
|
A function that returns a callable that maps datapoint
x to a new datapoint x'. make_transform_fn will be called as
make_transform_fn(raw_client_id, i) where i is an integer index, and
should return a function fn(x)->x. For example if x is an image, then
make_transform_fn("client_a", 0)(x) might be the identity, while
make_transform_fn("client_a", 1)(x) could be a random rotation of the
image with the angle determined by a hash of "client_a" and "1". If
transform_fn_cons returns None , no transformation is performed.
Typically by convention the index 0 corresponds to the identity function
if the identity is supported.
|
num_transformed_clients
|
The total number of transformed clients to produce. If it is an integer multiple k of the number of real clients, there will be exactly k pseudo-clients per real client, with indices 0...k-1. Any remainder g will be generated from the first g real clients and will be given index k. |
Attributes | |
---|---|
client_ids
|
A list of string identifiers for clients in this dataset. |
dataset_computation
|
A tff.Computation accepting a client ID, returning a dataset.
|
element_type_structure
|
The element type information of the client datasets.
elements returned by datasets in this |
Methods
create_tf_dataset_for_client
create_tf_dataset_for_client(
client_id
)
Creates a new tf.data.Dataset
containing the client training examples.
Args | |
---|---|
client_id
|
The string client_id for the desired client. |
Returns | |
---|---|
A tf.data.Dataset object.
|
create_tf_dataset_from_all_clients
create_tf_dataset_from_all_clients(
seed: Optional[int] = None
) -> tf.data.Dataset
Creates a new tf.data.Dataset
containing all client examples.
This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
Args | |
---|---|
seed
|
Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers. |
Returns | |
---|---|
A tf.data.Dataset object.
|
datasets
datasets(
limit_count: Optional[int] = None,
seed: Optional[int] = None
) -> Iterable[tf.data.Dataset]
Yields the tf.data.Dataset
for each client in random order.
This function is intended for use building a static array of client data to be provided to the top-level federated computation.
Args | |
---|---|
limit_count
|
Optional, a maximum number of datasets to return. |
seed
|
Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers. |
from_clients_and_fn
@classmethod
from_clients_and_fn( client_ids: Iterable[str], create_tf_dataset_for_client_fn: Callable[[str], tf.data.Dataset] ) -> "ConcreteClientData"
Constructs a ClientData
based on the given function.
Args | |
---|---|
client_ids
|
A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn. |
create_tf_dataset_for_client_fn
|
A function that takes a client_id from
the above list, and returns a tf.data.Dataset . If this function is
additionally a tff.Computation , the constructed ClientData
will expose a dataset_computation attribute which can be used for
high-performance distributed simulations.
|
Returns | |
---|---|
A ClientData .
|
preprocess
preprocess(
preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> "PreprocessClientData"
Applies preprocess_fn
to each client's data.
train_test_client_split
@classmethod
train_test_client_split( client_data: "ClientData", num_test_clients: int, seed: Optional[int] = None ) -> Tuple['ClientData', 'ClientData']
Returns a pair of (train, test) ClientData
.
This method partitions the clients of client_data
into two ClientData
objects with disjoint sets of ClientData.client_ids
. All clients in the
test ClientData
are guaranteed to have non-empty datasets, but the
training ClientData
may have clients with no data.
Args | |
---|---|
client_data
|
The base ClientData to split.
|
num_test_clients
|
How many clients to hold out for testing. This can be at
most len(client_data.client_ids) - 1, since we don't want to produce
empty ClientData .
|
seed
|
Optional seed to fix shuffling of clients before splitting. |
Returns | |
---|---|
A pair (train_client_data, test_client_data), where test_client_data
has num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
|
Raises | |
---|---|
ValueError
|
If num_test_clients cannot be satistifed by client_data ,
or too many clients have empty datasets.
|