tff.simulation.datasets.TransformingClientData

Transforms client data, potentially expanding by adding pseudo-clients.

Inherits From: ClientData

tff.simulation.datasets.TransformingClientData(
    base_client_data: tff.simulation.datasets.ClientData,
    make_transform_fn: Callable[[str], Callable[[Any], Any]],
    expand_client_id: Optional[Callable[[str], list[str]]] = None,
    reduce_client_id: Optional[Callable[[str], str]] = None
)

Each client of the base_client_data is "expanded" into some number of pseudo-clients. A serializable function fn(x) maps datapoint x to a new datapoint, where the constructor of fn is parameterized by the expanded client_id. For example if the client_id "client_A" has two expansions, "client_A-0" and "client_A-1" then make_transform_fn("client_A-0")(x) might be the identity, while make_transform_fn("client_A-1")(x) could be a random rotation of the image with the angle determined by a hash of the string "client_A-1".

Args
`base_client_data`	A ClientData to expand.
`make_transform_fn`	A function to be called as `make_transform_fn(client_id)`, where `client_id` is the expanded client id, which should return a function `transform_fn` that maps a datapoint x whose element type structure correspondes to `base_client_data` to a new datapoint x'. It must be traceable as a `tf.function`.
`expand_client_id`	An optional function that maps a client id of `base_client_data` to a list of expanded client ids. If None, the transformed data will have the same size and ids as the original.
`reduce_client_id`	An function that maps an expanded client id back to the raw client id. Must be traceable as a `tf.function`. Must be specified if and only if `expand_client_id` is.

Attributes
`client_ids`	A list of string identifiers for clients in this dataset.
`dataset_computation`	A `tff.Computation` accepting a client ID, returning a dataset. Note: the `dataset_computation` property is intended as a TFF-specific performance optimization for distributed execution.
`element_type_structure`	The element type information of the client datasets. elements returned by datasets in this `ClientData` object.
`serializable_dataset_fn`	A callable accepting a client ID and returning a `tf.data.Dataset`. Note that this callable must be traceable by TF, as it will be used in the context of a `tf.function`.

Methods

`create_tf_dataset_for_client`

View source

create_tf_dataset_for_client(
    client_id: str
) -> tf.data.Dataset

Creates a new tf.data.Dataset containing the client training examples.

This function will create a dataset for a given client, given that client_id is contained in the client_ids property of the ClientData. Unlike create_dataset, this method need not be serializable.

Args
`client_id`	The string client_id for the desired client.

Returns
A `tf.data.Dataset` object.

`create_tf_dataset_from_all_clients`

View source

create_tf_dataset_from_all_clients(
    seed: Optional[Union[int, Sequence[int]]] = None
) -> tf.data.Dataset

Creates a new tf.data.Dataset containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

Args
`seed`	Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or `None`.

Returns
A `tf.data.Dataset` object.

`datasets`

View source

datasets(
    limit_count: Optional[int] = None,
    seed: Optional[Union[int, Sequence[int]]] = None
) -> Iterable[tf.data.Dataset]

Yields the tf.data.Dataset for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

Args
`limit_count`	Optional, a maximum number of datasets to return.
`seed`	Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or `None`.

`from_clients_and_tf_fn`

View source

@classmethod
from_clients_and_tf_fn(
    client_ids: Iterable[str],
    serializable_dataset_fn: Callable[[str], tf.data.Dataset]
) -> 'ClientData'

Constructs a ClientData based on the given function.

Args
`client_ids`	A non-empty list of strings to use as input to `create_dataset_fn`.
`serializable_dataset_fn`	A function that takes a client_id from the above list, and returns a `tf.data.Dataset`. This function must be serializable and usable within the context of a `tf.function` and `tff.Computation`.

Raises
`TypeError`	If `serializable_dataset_fn` is a `tff.Computation`.

Returns
A `ClientData` object.

`preprocess`

View source

preprocess(
    preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> 'ClientData'

Applies preprocess_fn to each client's data.

Args
`preprocess_fn`	A callable accepting a `tf.data.Dataset` and returning a preprocessed `tf.data.Dataset`. This function must be traceable by TF.

Returns
A `tff.simulation.datasets.ClientData`.

Raises
`IncompatiblePreprocessFnError`	If `preprocess_fn` is a `tff.Computation`.

`train_test_client_split`

View source

@classmethod
train_test_client_split(
    client_data: 'ClientData',
    num_test_clients: int,
    seed: Optional[Union[int, Sequence[int]]] = None
) -> tuple['ClientData', 'ClientData']

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

Args
`client_data`	The base `ClientData` to split.
`num_test_clients`	How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty `ClientData`.
`seed`	Optional seed to fix shuffling of clients before splitting. The seed can be any nonnegative 32-bit integer, an array of such integers, or `None`.

Returns
A pair (train_client_data, test_client_data), where test_client_data has `num_test_clients` selected at random, subject to the constraint they each have at least 1 batch in their dataset.

Raises
`ValueError`	If `num_test_clients` cannot be satistifed by `client_data`, or too many clients have empty datasets.

tff.simulation.datasets.TransformingClientData Stay organized with collections Save and categorize content based on your preferences.

Args

Attributes

Methods

create_tf_dataset_for_client

create_tf_dataset_from_all_clients

datasets

from_clients_and_tf_fn

preprocess

train_test_client_split

tff.simulation.datasets.TransformingClientData

`create_tf_dataset_for_client`

`create_tf_dataset_from_all_clients`

`datasets`

`from_clients_and_tf_fn`

`preprocess`

`train_test_client_split`