Сохраните дату! Google I / O возвращается 18-20 мая Зарегистрируйтесь сейчас


Object to hold a federated dataset.

The federated dataset is represented as a list of client ids, and a function to look up the local dataset for each client id.

Each client's local dataset is represented as a tf.data.Dataset, but generally this class (and the corresponding datasets hosted by TFF) can easily be consumed by any Python-based ML framework as numpy arrays:

import tensorflow as tf
import tensorflow_federated as tff
import tensorflow_datasets as tfds

for client_id in sampled_client_ids[:5]:
  client_local_dataset = tfds.as_numpy(
  # client_local_dataset is an iterable of structures of numpy arrays
  for example in client_local_dataset:

If desiring a manner for constructing ClientData objects for testing purposes, please see the tff.simulation.datasets.TestClientData class, as it provides an easy way to construct toy federated datasets.

client_ids A list of string identifiers for clients in this dataset.
dataset_computation A tff.Computation accepting a client ID, returning a dataset.

ClientData implementations that don't support dataset_computation should raise NotImplementedError if this attribute is accessed.

element_type_structure The element type information of the client datasets.

elements returned by datasets in this ClientData object.



View source

Creates a new tf.data.Dataset containing the client training examples.

client_id The string client_id for the desired client.

A tf.data.Dataset object.


View source

Creates a new tf.data.Dataset containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.

A tf.data.Dataset object.


View source

Yields the tf.data.Dataset for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

limit_count Optional, a maximum number of datasets to return.
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any 32-bit unsigned integer or an array of such integers.


View source

Constructs a ClientData based on the given function.

client_ids A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
create_tf_dataset_for_client_fn A function that takes a client_id from the above list, and returns a tf.data.Dataset. If this function is additionally a tff.Computation, the constructed ClientData will expose a dataset_computation attribute which can be used for high-performance distributed simulations.

A ClientData.


View source

Applies preprocess_fn to each client's data.


View source

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

client_data The base ClientData to split.
num_test_clients How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.
seed Optional seed to fix shuffling of clients before splitting.

A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

ValueError If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.