|View source on GitHub|
tff.simulation.ClientData backed by an HDF5 file.
This class expects that the HDF5 file has a top-level group
contains further subgroups, one per user, named by the user ID.
tf.data.Dataset returned by
HDF5ClientData.create_tf_dataset_for_client(client_id) yields tuples from
zipping all datasets that were found at
/data/client_id group, in a similar
hdf5_filepath: String path to the hdf5 file.
The list of string identifiers for clients in this dataset.
The element type information of the client datasets.
A nested structure of
tf.TensorSpec objects defining the type of the
elements returned by datasets in this
Creates a new
tf.data.Dataset containing the client training examples.
client_id: The string client_id for the desired client.
Creates a new
tf.data.Dataset containing all client examples.
NOTE: the returned
tf.data.Dataset is not serializable and runnable on
other devices, as it uses
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
seed: Optional, a seed to determine the order in which clients are processed in the joined dataset.
from_clients_and_fn( cls, client_ids, create_tf_dataset_for_client_fn )
ClientData based on the given function.
client_ids: A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
create_tf_dataset_for_client_fn: A function that takes a client_id from the above list, and returns a
preprocess_fn to each client's data.
train_test_client_split( cls, client_data, num_test_clients )
Returns a pair of (train, test)
This method partitions the clients of
client_data into two
objects with disjoint sets of
ClientData.client_ids. All clients in the
ClientData are guaranteed to have non-empty datasets, but the
ClientData may have clients with no data.
client_data: The base
num_test_clients: How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty
A pair (train_client_data, test_client_data), where test_client_data
num_test_clients selected at random, subject to the constraint they
each have at least 1 batch in their dataset.
num_test_clientscannot be satistifed by
client_data, or too many clients have empty datasets.