Missed TensorFlow Dev Summit? Check out the video playlist. Watch recordings

tff.simulation.FilePerUserClientData

View source on GitHub

A tf.simulation.ClientData that maps a set of files to a dataset.

Inherits From: ClientData

tff.simulation.FilePerUserClientData(
    client_ids, create_tf_dataset_fn
)

This mapping is restricted to one file per user.

Args:

  • client_ids: A list of client_ids.
  • create_tf_dataset_fn: A callable that takes a client_id and returns a tf.data.Dataset object.

Attributes:

  • client_ids: A list of string identifiers for clients in this dataset.
  • element_type_structure: The element type information of the client datasets.

    elements returned by datasets in this ClientData object.

Methods

create_from_dir

View source

@classmethod
create_from_dir(
    cls, path, create_tf_dataset_fn=tf.data.TFRecordDataset
)

Builds a tff.simulation.FilePerUserClientData.

Iterates over all files in path, using the filename as the client ID. Does not recursively search path.

Args:

  • path: A directory path to search for per-client files.
  • create_tf_dataset_fn: A callable that creates a tf.data.Datasaet object for a given file in the directory specified in path.

Returns:

A tff.simulation.FilePerUserClientData object.

create_tf_dataset_for_client

View source

create_tf_dataset_for_client(
    client_id
)

Creates a new tf.data.Dataset containing the client training examples.

Args:

  • client_id: The string client_id for the desired client.

Returns:

A tf.data.Dataset object.

create_tf_dataset_from_all_clients

View source

create_tf_dataset_from_all_clients(
    seed=None
)

Creates a new tf.data.Dataset containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

Args:

  • seed: Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

Returns:

A tf.data.Dataset object.

datasets

View source

datasets(
    limit_count=None, seed=None
)

Yields the tf.data.Dataset for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

Args:

  • limit_count: Optional, a maximum number of datasets to return.
  • seed: Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

from_clients_and_fn

View source

@classmethod
from_clients_and_fn(
    cls, client_ids, create_tf_dataset_for_client_fn
)

Constructs a ClientData based on the given function.

Args:

  • client_ids: A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
  • create_tf_dataset_for_client_fn: A function that takes a client_id from the above list, and returns a tf.data.Dataset.

Returns:

A ClientData.

preprocess

View source

preprocess(
    preprocess_fn
)

Applies preprocess_fn to each client's data.

train_test_client_split

View source

@classmethod
train_test_client_split(
    cls, client_data, num_test_clients
)

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

Args:

  • client_data: The base ClientData to split.
  • num_test_clients: How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.

Returns:

A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

Raises:

  • ValueError: If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.