Object to hold a federated dataset.

The federated dataset is represented as a list of client ids, and a function to look up the local dataset for each client id.

Each client's local dataset is represented as a, but generally this class (and the corresponding datasets hosted by TFF) can easily be consumed by any Python-based ML framework as numpy arrays:

import tensorflow as tf
import tensorflow_federated as tff
import tensorflow_datasets as tfds

for client_id in sampled_client_ids[:5]:
  client_local_dataset = tfds.as_numpy(
  # client_local_dataset is an iterable of structures of numpy arrays
  for example in client_local_dataset:

client_ids A list of string identifiers for clients in this dataset.
dataset_computation A tff.Computation accepting a client ID, returning a dataset.

ClientData implementations that don't support dataset_computation should raise NotImplementedError if this attribute is accessed.

element_type_structure The element type information of the client datasets.

elements returned by datasets in this ClientData object.



View source

Creates a new containing the client training examples.

client_id The string client_id for the desired client.

A object.


View source

Creates a new containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

A object.


View source

Yields the for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

limit_count Optional, a maximum number of datasets to return.
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.


View source

Constructs a ClientData based on the given function.

client_ids A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
create_tf_dataset_for_client_fn A function that takes a client_id from the above list, and returns a

A ClientData.


View source

Applies preprocess_fn to each client's data.


View source

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

client_data The base ClientData to split.
num_test_clients How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.

A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

ValueError If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.