Missed TensorFlow World? Check out the recap. Learn more

Using TFF for Federated Learning Research

Note: This page is currently being populated


TFF is an extensible, powerful framework for conducting federated learning (FL) research by simulating federated computations on realistic proxy datasets. This page describes the main concepts and components that are relevant for research simulations, as well as detailed guidance for conducting different kinds of research in TFF.

The typical structure of research code in TFF

A research FL simulation implemented in TFF typically consists of three main types of logic.

  1. Individual pieces of TensorFlow code, typically tf.functions, that encapsulate logic that runs in a single location (e.g., on clients or on a server). This code is typically written and tested without any tff.* references, and can be re-used outside of TFF. For example, the local training loop in Federated Averaging is implemented at this level.

  2. TensorFlow Federated orchestration logic, which binds together the individual tf.functions from 1. by wrapping them as tff.tf_computations and then orchestrating them using abstractions like tff.federated_broadcast and tff.federated_mean inside a tff.federated_comutation. For example, this orchestration for Federated Averaging.

  3. An outer driver script that simulates the control logic of a production FL system, selecting simulated clients from a dataset and then executing federated comptuations defined in 2. on those clients. For example, a Federated EMNIST experiment driver.

Federated learning datasets

TensorFlow federated hosts multiple datasets that are representative of the characteristics of real-world problems that could be solved with federated learning. Datasets include:

  • StackOverflow. A realistic text dataset for language modeling or supervised learning tasks, with 342,477 unique users with 135,818,730 examples (sentences) in the training set.

  • Federated EMNIST. A federated pre-processing of the EMNIST character and digit dataset, where each client corresponds to a different writer. The full train set contains 3400 users with 671,585 examples from 62 labels.

  • Shakespeare. A smaller char-level text dataset based on the complete works of William Shakespeare. The data set consists of 715 users (characters of Shakespeare plays), where each example corresponds to a contiguous set of lines spoken by the character in a given play.

High performance simulations

While the wall-clock time of an FL simulation is not a relevant metric for evaluating algorithms (as simulation hardware isn't representative of real FL deployment environments), being able to run FL simulations quickly is critical for research productivity. Hence, TFF has invested heavily in providing high-performance single and multi-machine runtimes. Documentation is under development, but for now see the High-performance simulations with TFF tutorial as well as instructions on setting up simulations with TFF on GCP. For fast-single machine experiments, use


This should become the default soon.

TFF for different research areas

Federated optimization algorithms

Model and update compression

Meta-learning and multi-task learning

Differential privacy

Robustness and attacks