Missed TensorFlow World? Check out the recap. Learn more

tff.simulation.HDF5ClientData

View source on GitHub

Class HDF5ClientData

A tff.simulation.ClientData backed by an HDF5 file.

Inherits From: ClientData

This class expects that the HDF5 file has a top-level group examples which contains further subgroups, one per user, named by the user ID.

The tf.data.Dataset returned by HDF5ClientData.create_tf_dataset_for_client(client_id) yields tuples from zipping all datasets that were found at /data/client_id group, in a similar fashion to tf.data.Dataset.from_tensor_slices().

__init__

View source

__init__(hdf5_filepath)

Constructs a tff.simulation.ClientData object.

Args:

  • hdf5_filepath: String path to the hdf5 file.

Properties

client_ids

The list of string identifiers for clients in this dataset.

output_shapes

Returns the shape of each component of an element of the client datasets.

Any tf.data.Dataset constructed by this class is expected to have matching output_shapes properties when accessed via tf.compat.v1.data.get_output_shapes(dataset).

Returns:

A nested structure of tf.TensorShape objects corresponding to each component of an element of the client datasets.

output_types

Returns the type of each component of an element of the client datasets.

Any tf.data.Dataset constructed by this class is expected have matching output_types properties when accessed via tf.compat.v1.data.get_output_types(dataset).

Returns:

A nested structure of tf.DType objects corresponding to each component of an element of the client datasets.

Methods

create_tf_dataset_for_client

View source

create_tf_dataset_for_client(client_id)

Creates a new tf.data.Dataset containing the client training examples.

Args:

  • client_id: The string client_id for the desired client.

Returns:

A tf.data.Dataset object.

create_tf_dataset_from_all_clients

View source

create_tf_dataset_from_all_clients(seed=None)

Creates a new tf.data.Dataset containing all client examples.

NOTE: the returned tf.data.Dataset is not serializable and runnable on other devices, as it uses tf.py_func internally.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

Args:

  • seed: Optional, a seed to determine the order in which clients are processed in the joined dataset.

Returns:

A tf.data.Dataset object.

from_clients_and_fn

View source

from_clients_and_fn(
    cls,
    client_ids,
    create_tf_dataset_for_client_fn
)

Constructs a ClientData based on the given function.

Args:

  • client_ids: A non-empty list of client_ids which are valid inputs to the create_tf_dataset_for_client_fn.
  • create_tf_dataset_for_client_fn: A function that takes a client_id from the above list, and returns a tf.data.Dataset.

Returns:

A ClientData.

preprocess

View source

preprocess(preprocess_fn)

Applies preprocess_fn to each client's data.

train_test_client_split

View source

train_test_client_split(
    cls,
    client_data,
    num_test_clients
)

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

Args:

  • client_data: The base ClientData to split.
  • num_test_clients: How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.

Returns:

A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

Raises:

  • ValueError: If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.