Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

Custom Federated Algorithms, Part 1: Introduction to the Federated Core

View on TensorFlow.org Run in Google Colab View source on GitHub

This tutorial is the first part of a two-part series that demonstrates how to implement custom types of federated algorithms in TensorFlow Federated (TFF) using the Federated Core (FC) - a set of lower-level interfaces that serve as a foundation upon which we have implemented the Federated Learning (FL) layer.

This first part is more conceptual; we introduce some of the key concepts and programming abstractions used in TFF, and we demonstrate their use on a very simple example with a distributed array of temperature sensors. In the second part of this series, we use the mechanisms we introduce here to implement a simple version of federated training and evaluation algorithms. As a follow-up, we encourage you to study the implementation of federated averaging in tff.learning.

By the end of this series, you should be able to recognize that the applications of Federated Core are not necessarily limited to learning. The programming abstractions we offer are quite generic, and could be used, e.g., to implement analytics and other custom types of computations over distributed data.

Although this tutorial is designed to be self-contained, we encourage you to first read tutorials on image classification and text generation for a higher-level and more gentle introduction to the TensorFlow Federated framework and the Federated Learning APIs (tff.learning), as it will help you put the concepts we describe here in context.

Intended Uses

In a nutshell, Federated Core (FC) is a development environment that makes it possible to compactly express program logic that combines TensorFlow code with distributed communication operators, such as those that are used in Federated Averaging - computing distributed sums, averages, and other types of distributed aggregations over a set of client devices in the system, broadcasting models and parameters to those devices, etc.

You may be aware of tf.contrib.distribute, and a natural question to ask at this point may be: in what ways does this framework differ? Both frameworks attempt at making TensorFlow computations distributed, after all.

One way to think about it is that, whereas the stated goal of tf.contrib.distribute is to allow users to use existing models and training code with minimal changes to enable distributed training, and much focus is on how to take advantage of distributed infrastructure to make existing training code more efficient, the goal of TFF's Federated Core is to give researchers and practitioners explicit control over the specific patterns of distributed communication they will use in their systems. The focus in FC is on providing a flexible and extensible language for expressing distributed data flow algorithms, rather than a concrete set of implemented distributed training capabilities.

One of the primary target audiences for TFF's FC API is researchers and practitioners who might want to experiment with new federated learning algorithms and evaluate the consequences of subtle design choices that affect the manner in which the flow of data in the distributed system is orchestrated, yet without getting bogged down by system implementation details. The level of abstraction that FC API is aiming for roughly corresponds to pseudocode one could use to describe the mechanics of a federated learning algorithm in a research publication - what data exists in the system and how it is transformed, but without dropping to the level of individual point-to-point network message exchanges.

TFF as a whole is targeting scenarios in which data is distributed, and must remain such, e.g., for privacy reasons, and where collecting all data at a centralized location may not be a viable option. This has implication on the implementation of machine learning algorithms that require an increased degree of explicit control, as compared to scenarios in which all data can be accumulated in a centralized location at a data center.

Before we start

Before we dive into the code, please try to run the following "Hello World" example to make sure your environment is correctly setup. If it doesn't work, please refer to the Installation guide for instructions.

#@test {"skip": true}

# NOTE: If you are running a Jupyter notebook, and installing a locally built
# pip package, you may need to edit the following to point to the '.whl' file
# on your local filesystem.

!pip install --quiet --upgrade tensorflow_federated
!pip install --quiet --upgrade tf-nightly
ERROR: tensorflow-federated 0.8.0 has requirement tf-nightly==1.15.0.dev20190805, but you'll have tf-nightly 1.15.0.dev20190821 which is incompatible.
from __future__ import absolute_import, division, print_function

import collections

import numpy as np
from six.moves import range
import tensorflow as tf
import tensorflow_federated as tff

tf.enable_resource_variables()
WARNING:tensorflow:

  TensorFlow's `tf-nightly` package will soon be updated to TensorFlow 2.0.

  Please upgrade your code to TensorFlow 2.0:
    * https://www.tensorflow.org/beta/guide/migration_guide

  Or install the latest stable TensorFlow 1.X release:
    * `pip install -U "tensorflow==1.*"`

  Otherwise your code may be broken by the change.

  
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

@tff.federated_computation
def hello_world():
  return 'Hello, World!'


hello_world()
'Hello, World!'

Federated data

One of the distinguishing features of TFF is that it allows you to compactly express TensorFlow-based computations on federated data. We will be using the term federated data in this tutorial to refer to a collection of data items hosted across a group of devices in a distributed system. For example, applications running on mobile devices may collect data and store it locally, without uploading to a centralized location. Or, an array of distributed sensors may collect and store temperature readings at their locations.

Federated data like those in the above examples are treated in TFF as first-class citizens, i.e., they may appear as parameters and results of functions, and they have types. To reinforce this notion, we will refer to federated data sets as federated values, or as values of federated types.

The important point to understand is that we are modeling the entire collection of data items across all devices (e.g., the entire collection temperature readings from all sensors in a distributed array) as a single federated value.

For example, here's how one would define in TFF the type of a federated float hosted by a group of client devices. A collection of temperature readings that materialize across an array of distributed sensors could be modeled as a value of this federated type.

federated_float_on_clients = tff.FederatedType(tf.float32, tff.CLIENTS)

More generally, a federated type in TFF is defined by specifying the type T of its member constituents - the items of data that reside on individual devices, and the group G of devices on which federated values of this type are hosted (plus a third, optional bit of information we'll mention shortly). We refer to the group G of devices hosting a federated value as the value's placement. Thus, tff.CLIENTS is an example of a placement.

str(federated_float_on_clients.member)
'float32'
str(federated_float_on_clients.placement)
'CLIENTS'

A federated type with member constituents T and placement G can be represented compactly as {T}@G, as shown below.

str(federated_float_on_clients)
'{float32}@CLIENTS'

The curly braces {} in this concise notation serve as a reminder that the member constituents (items of data on different devices) may differ, as you would expect e.g., of temperature sensor readings, so the clients as a group are jointly hosting a multi-set of T-typed items that together constitute the federated value.

It is important to note that the member constituents of a federated value are generally opaque to the programmer, i.e., a federated value should not be thought of as a simple dict keyed by an identifier of a device in the system - these values are intended to be collectively transformed only by federated operators that abstractly represent various kinds of distributed communication protocols (such as aggregation). If this sounds too abstract, don't worry - we will return to this shortly, and we will illustrate it with concrete examples.

Federated types in TFF come in two flavors: those where the member constituents of a federated value may differ (as just seen above), and those where they are known to be all equal. This is controlled by the third, optional all_equal parameter in the tff.FederatedType constructor (defaulting to False).

federated_float_on_clients.all_equal
False

A federated type with a placement G in which all of the T-typed member constituents are known to be equal can be compactly represented as T@G (as opposed to {T}@G, that is, with the curly braces dropped to reflect the fact that the multi-set of member constituents consists of a single item).

str(tff.FederatedType(tf.float32, tff.CLIENTS, all_equal=True))
'float32@CLIENTS'

One example of a federated value of such type that might arise in practical scenarios is a hyperparameter (such as a learning rate, a clipping norm, etc.) that has been broadcasted by a server to a group of devices that participate in federated training.

Another example is a set of parameters for a machine learning model pre-trained at the server, that were then broadcasted to a group of client devices, where they can be personalized for each user.

For example, suppose we have a pair of float32 parameters a and b for a simple one-dimensional linear regression model. We can construct the (non-federated) type of such models for use in TFF as follows. The angle braces <> in the printed type string are a compact TFF notation for named or unnamed tuples.

simple_regression_model_type = (
    tff.NamedTupleType([('a', tf.float32), ('b', tf.float32)]))

str(simple_regression_model_type)
'<a=float32,b=float32>'

Note that we are only specifying dtypes above. Non-scalar types are also supported. In the above code, tf.float32 is a shortcut notation for the more general tff.TensorType(dtype=tf.float32, shape=[]).

When this model is broadcasted to clients, the type of the resulting federated value can be represented as shown below.

str(tff.FederatedType(
    simple_regression_model_type, tff.CLIENTS, all_equal=True))
'<a=float32,b=float32>@CLIENTS'

Per symmetry with federated float above, we will refer to such a type as a federated tuple. More generally, we'll often use the term federated XYZ to refer to a federated value in which member constituents are XYZ-like. Thus, we will talk about things like federated tuples, federated sequences, federated models, and so on.

Now, coming back to float32@CLIENTS - while it appears replicated across multiple devices, it is actually a single float32, since all member are the same. In general, you may think of any all-equal federated type, i.e., one of the form T@G, as isomorphic to a non-federated type T, since in both cases, there's actually only a single (albeit potentially replicated) item of type T.

Given the isomorphism between T and T@G, you may wonder what purpose, if any, the latter types might serve. Read on.

Placements

Design Overview

In the preceding section, we've introduced the concept of placements - groups of system participants that might be jointly hosting a federated value, and we've demonstrated the use of tff.CLIENTS as an example specification of a placement.

To explain why the notion of a placement is so fundamental that we needed to incorporate it into the TFF type system, recall what we mentioned at the beginning of this tutorial about some of the intended uses of TFF.

Although in this tutorial, you will only see TFF code being executed locally in a simulated environment, our goal is for TFF to enable writing code that you could deploy for execution on groups of physical devices in a distributed system, potentially including mobile or embedded devices running Android. Each of of those devices would receive a separate set of instructions to execute locally, depending on the role it plays in the system (an end-user device, a centralized coordinator, an intermediate layer in a multi-tier architecture, etc.). It is important to be able to reason about which subsets of devices execute what code, and where different portions of the data might physically materialize.

This is especially important when dealing with, e.g., application data on mobile devices. Since the data is private and can be sensitive, we need the ability to statically verify that this data will never leave the device (and prove facts about how the data is being processed). The placement specifications are one of the mechanisms designed to support this.

TFF has been designed as a data-centric programming environment, and as such, unlike some of the existing frameworks that focus on operations and where those operations might run, TFF focuses on data, where that data materializes, and how it's being transformed. Consequently, placement is modeled as a property of data in TFF, rather than as a property of operations on data. Indeed, as you're about to see in the next section, some of the TFF operations span across locations, and run "in the network", so to speak, rather than being executed by a single machine or a group of machines.

Representing the type of a certain value as T@G or {T}@G (as opposed to just T) makes data placement decisions explicit, and together with a static analysis of programs written in TFF, it can serve as a foundation for providing formal privacy guarantees for sensitive on-device data.

An important thing to note at this point, however, is that while we encourage TFF users to be explicit about groups of participating devices that host the data (the placements), the programmer will never deal with the raw data or identities of the individual participants.

(NOTE: While it goes far outside the scope of this tutorial, we should mention that there is one notable exception to the above, a tff.federated_collect operator that is intended as a low-level primitive, only for specialized situations. Its explicit use in situations where it can be avoided is not recommended, as it may limit the possible future applications. For example, if during the course of static analysis, we determine that a computation uses such low-level mechanisms, we may disallow its access to certain types of data.)

Within the body of TFF code, by design, there's no way to enumerate the devices that constitute the group represented by tff.CLIENTS, or to probe for the existence of a specific device in the group. There's no concept of a device or client identity anywhere in the Federated Core API, the underlying set of architectural abstractions, or the core runtime infrastructure we provide to support simulations. All the computation logic you write will be expressed as operations on the entire client group.

Recall here what we mentioned earlier about values of federated types being unlike Python dict, in that one cannot simply enumerate their member constituents. Think of values that your TFF program logic manipulates as being associated with placements (groups), rather than with individual participants.

Placements are designed to be a first-class citizen in TFF as well, and can appear as parameters and results of a placement type (to be represented by tff.PlacementType in the API). In the future, we plan to provide a variety of operators to transform or combine placements, but this is outside the scope of this tutorial. For now, it suffices to think of placement as an opaque primitive built-in type in TFF, similar to how int and bool are opaque built-in types in Python, with tff.CLIENTS being a constant literal of this type, not unlike 1 being a constant literal of type int.

Specifying Placements

TFF provides two basic placement literals, tff.CLIENTS and tff.SERVER, to make it easy to express the rich variety of practical scenarios that are naturally modeled as client-server architectures, with multiple client devices (mobile phones, embedded devices, distributed databases, sensors, etc.) orchestrated by a single centralized server coordinator. TFF is designed to also support custom placements, multiple client groups, multi-tiered and other, more general distributed architectures, but discussing them is outside the scope of this tutorial.

TFF doesn't prescribe what either the tff.CLIENTS or the tff.SERVER actually represent.

In particular, tff.SERVER may be a single physical device (a member of a singleton group), but it might just as well be a group of replicas in a fault-tolerant cluster running state machine replication - we do not make any special architectural assumptions. Rather, we use the all_equal bit mentioned in the preceding section to express the fact that we're generally dealing with only a single item of data at the server.

Likewise, tff.CLIENTS in some applications might represent all clients in the system - what in the context of federated learning we sometimes refer to as the population, but e.g., in production implementations of Federated Averaging, it may represent a cohort - a subset of the clients selected for paticipation in a particular round of training. The abstractly defined placements are given concrete meaning when a computation in which they appear is deployed for execution (or simply invoked like a Python function in a simulated environment, as is demonstrated in this tutorial). In our local simulations, the group of clients is determined by the federated data supplied as input.

Federated computations

Declaring federated computations

TFF is designed as a strongly-typed functional programming environment that supports modular development.

The basic unit of composition in TFF is a federated computation - a section of logic that may accept federated values as input and return federated values as output. Here's how you can define a computation that calculates the average of the temperatures reported by the sensor array from our previous example.

@tff.federated_computation(tff.FederatedType(tf.float32, tff.CLIENTS))
def get_average_temperature(sensor_readings):
  return tff.federated_mean(sensor_readings)

Looking at the above code, at this point you might be asking - aren't there already decorator constructs to define composable units such as tf.function in TensorFlow, and if so, why introduce yet another one, and how is it different?

The short answer is that the code generated by the tff.federated_computation wrapper is neither TensorFlow, nor is it Python - it's a specification of a distributed system in an internal platform-independent glue language. At this point, this will undoubtedly sound cryptic, but please bear this intuitive interpretation of a federated computation as an abstract specification of a distributed system in mind. We'll explain it in a minute.

First, let's play with the definition a bit. TFF computations are generally modeled as functions - with or without parameters, but with well-defined type signatures. You can print the type signature of a computation by querying its type_signature property, as shown below.

str(get_average_temperature.type_signature)
'({float32}@CLIENTS -> float32@SERVER)'

The type signature tells us that the computation accepts a collection of different sensor readings on client devices, and returns a single average on the server.

Before we go any further, let's reflect on this for a minute - the input and output of this computation are in different places (on CLIENTS vs. at the SERVER). Recall what we said in the preceding section on placements about how TFF operations may span across locations, and run in the network, and what we just said about federated computations as representing abstract specifications of distributed systems. We have just a defined one such computation - a simple distributed system in which data is consumed at client devices, and the aggregate results emerge at the server.

In many practical scenarios, the computations that represent top-level tasks will tend to accept their inputs and report their outputs at the server - this reflects the idea that computations might be triggered by queries that originate and terminate on the server.

However, FC API does not impose this assumption, and many of the building blocks we use internally (including numerous tff.federated_... operators you may find in the API) have inputs and outputs with distinct placements, so in general, you should not think about a federated computation as something that runs on the server or is executed by a server. The server is just one type of participant in a federated computation. In thinking about the mechanics of such computations, it's best to always default to the global network-wide perspective, rather than the perspective of a single centralized coordinator.

In general, functional type signatures are compactly represented as (T -> U) for types T and U of inputs and outputs, respectively. The type of the formal parameter (such sensor_readings in this case) is specified as the argument to the decorator. You don't need to specify the type of the result - it's determined automatically.

Although TFF does offer limited forms of polymorphism, programmers are strongly encouraged to be explicit about the types of data they work with, as that makes understanding, debugging, and formally verifying properties of your code easier. In some cases, explicitly specifying types is a requirement (e.g., polymorphic computations are currently not directly executable).

Executing federated computations

In order to support development and debugging, TFF allows you to directly invoke computations defined this way as Python functions, as shown below. Where the computation expects a value of a federated type with the all_equal bit set to False, you can feed it as a plain list in Python, and for federated types with the all_equal bit set to True, you can just directly feed the (single) member constituent. This is also how the results are reported back to you.

get_average_temperature([68.5, 70.3, 69.8])
69.53333

When running computations like this in simulation mode, you act as an external observer with a system-wide view, who has the ability to supply inputs and consume outputs at any locations in the network, as indeed is the case here - you supplied client values at input, and consumed the server result.

Now, let's return to a note we made earlier about the tff.federated_computation decorator emitting code in a glue language. Although the logic of TFF computations can be expressed as ordinary functions in Python (you just need to decorate them with tff.federated_computation as we've done above), and you can directly invoke them with Python arguments just like any other Python functions in this notebook, behind the scenes, as we noted earlier, TFF computations are actually not Python.

What we mean by this is that when the Python interpreter encounters a function decorated with tff.federated_computation, it traces the statements in this function's body once (at definition time), and then constructs a serialized representation of the computation's logic for future use - whether for execution, or to be incorporated as a sub-component into another computation.

You can verify this by adding a print statement, as follows:

@tff.federated_computation(tff.FederatedType(tf.float32, tff.CLIENTS))
def get_average_temperature(sensor_readings):

  print ('Getting traced, the argument is "{}".'.format(
      type(sensor_readings).__name__))

  return tff.federated_mean(sensor_readings)
Getting traced, the argument is "ValueImpl".

You can think of Python code that defines a federated computation similarly to how you would think of Python code that builds a TensorFlow graph in a non-eager context (if you're not familiar with the non-eager uses of TensorFlow, think of your Python code defining a graph of operations to be executed later, but not actually running them on the fly). The non-eager graph-building code in TensorFlow is Python, but the TensorFlow graph constructed by this code is platform-independent and serializable.

Likewise, TFF computations are defined in Python, but the Python statements in their bodies, such as tff.federated_mean in the example weve just shown, are compiled into a portable and platform-independent serializable representation under the hood.

As a developer, you don't need to concern yourself with the details of this representation, as you will never need to directly work with it, but you should be aware of its existence, the fact that TFF computations are fundamentally non-eager, and cannot capture arbitrary Python state. Python code contained in a TFF computation's body is executed at definition time, when the body of the Python function decorated with tff.federated_computation is traced before getting serialized. It's not retraced again at invocation time (except when the function is polymorphic; please refer to the documentation pages for details).

You may wonder why we've chosen to introduce a dedicated internal non-Python representation. One reason is that ultimately, TFF computations are intended to be deployable to real physical environments, and hosted on mobile or embedded devices, where Python may not be available.

Another reason is that TFF computations express the global behavior of distributed systems, as opposed to Python programs which express the local behavior of individual participants. You can see that in the simple example above, with the special operator tff.federated_mean that accepts data on client devices, but deposits the results on the server.

The operator tff.federated_mean cannot be easily modeled as an ordinary operator in Python, since it doesn't execute locally - as noted earlier, it represents a distributed system that coordinates the behavior of multiple system participants. We will refer to such operators as federated operators, to distinguish them from ordinary (local) operators in Python.

The TFF type system, and the fundamental set of operations supported in the TFF's language, thus deviates significantly from those in Python, necessitating the use of a dedicated representation.

Composing federated computations

As noted above, federated computations and their constituents are best understood as models of distributed systems, and you can think of composing federated computations as composing more complex distributed systems from simpler ones. You can think of the tff.federated_mean operator as a kind of built-in template federated computation with a type signature ({T}@CLIENTS -> T@SERVER) (indeed, just like computations you write, this operator also has a complex structure - under the hood we break it down into simpler operators).

The same is true of composing federated computations. The computation get_average_temperature may be invoked in a body of another Python function decorated with tff.federated_computation - doing so will cause it to be embedded in the body of the parent, much in the same way tff.federated_mean was embedded in its own body earlier.

An important restriction to be aware of is that bodies of Python functions decorated with tff.federated_computation must consist only of federated operators, i.e., they cannot directly contain TensorFlow operations. For example, you cannot directly use tf.nest interfaces to add a pair of federated values. TensorFlow code must be confined to blocks of code decorated with a tff.tf_computation discussed in the following section. Only when wrapped in this manner can the wrapped TensorFlow code be invoked in the body of a tff.federated_computation.

The reasons for this separation are technical (it's hard to trick operators such as tf.add to work with non-tensors) as well as architectural. The language of federated computations (i.e., the logic constructed from serialized bodies of Python functions decorated with tff.federated_computation) is designed to serve as a platform-independent glue language. This glue language is currently used to build distributed systems from embedded sections of TensorFlow code (confined to tff.tf_computation blocks). In the fullness of time, we anticipate the need to embed sections of other, non-TensorFlow logic, such as relational database queries that might represent input pipelines, all connected together using the same glue language (the tff.federated_computation blocks).

TensorFlow logic

Declaring TensorFlow computations

TFF is designed for use with TensorFlow. As such, the bulk of the code you will write in TFF is likely to be ordinary (i.e., locally-executing) TensorFlow code. In order to use such code with TFF, as noted above, it just needs to be decorated with tff.tf_computation.

For example, here's how we could implement a function that takes a number and adds 0.5 to it.

@tff.tf_computation(tf.float32)
def add_half(x):
  return tf.add(x, 0.5)

Once again, looking at this, you may be wondering why we should define another decorator tff.tf_computation instead of simply using an existing mechanism such as tf.function. Unlike in the preceding section, here we are dealing with an ordinary block of TensorFlow code.

There are a few reasons for this, the full treatment of which goes beyond the scope of this tutorial, but it's worth naming the main two:

  • In order to embed reusable building blocks implemented using TensorFlow code in the bodies of federated computations, they need to satisfy certain properties - such as getting traced and serialized at definition time, having type signatures, etc. This generally requires some form of a decorator.

  • In addition, TFF needs the ability for computations to be able to accept data streams (represented as tf.data.Datasets), such as streams of training example batches in machine learning applications, as either inputs or outputs. This capability currently does not exist in TensorFlow; the tff.tf_computation decorator offers partial (and for now still experimental) support for it.

In general, we recommend using TensorFlow's native mechanisms for composition, such as tf.function, wherever possible, as the exact manner in which TFF's decorator interacts with eager functions can be expected to evolve.

Now, coming back to the example code snippet above, the computation add_half we just defined can be treated by TFF just like any other TFF computation. In particular, it has a TFF type signature.

str(add_half.type_signature)
'(float32 -> float32)'

Note this type signature does not have placements. TensorFlow computations cannot consume or return federated types.

You can now also use add_half as a building block in other computations . For example, here's how you can use the tff.federated_map operator to apply add_half pointwise to all member constituents of a federated float on client devices.

@tff.federated_computation(tff.FederatedType(tf.float32, tff.CLIENTS))
def add_half_on_clients(x):
  return tff.federated_map(add_half, x)
str(add_half_on_clients.type_signature)
'({float32}@CLIENTS -> {float32}@CLIENTS)'

Executing TensorFlow computations

Execution of computations defined with tff.tf_computation follows the same rules as those we described for tff.federated_computation. They can be invoked as ordinary callables in Python, as follows.

add_half_on_clients([1.0, 3.0, 2.0])
[1.5, 3.5, 2.5]

Once again, it is worth noting that invoking the computation add_half_on_clients in this manner simulates a distirbuted process. Data is consumed on clients, and returned on clients. Indeed, this computation has each client perform a local action. There is no tff.SERVER explicitly mentioned in this system (even if in practice, orchestrating such processing might involve one). Think of a computation defined this way as conceptually analogous to the Map stage in MapReduce.

Also, keep in mind that what we said in the preceding section about TFF computations getting serialized at the definition time remains true for tff.tf_computation code as well - the Python body of add_half_on_clients gets traced once at definition time. On subsequent invocations, TFF uses its serialized representation.

The only difference between Python methods decorated with tff.federated_computation and those decorated with tff.tf_computation is that the latter are serialized as TensorFlow graphs (whereas the former are not allowed to contain TensorFlow code directly embedded in them).

Under the hood, each method decorated with tff.tf_computation temporarily disables eager execution in order to allow the computation's structure to be captured. While eager execution is locally disabled, you are welcome to use eager TensorFlow, AutoGraph, TensorFlow 2.0 constructs, etc., so long as you write the logic of your computation in a manner such that it can get correctly serialized.

For example, the following code will fail:

try:

  # Eager mode
  constant_10 = tf.constant(10.)

  @tff.tf_computation(tf.float32)
  def add_ten(x):
    return x + constant_10

except Exception as err:
  print (err)
Tensor("Const:0", shape=(), dtype=float32) must be from the same graph as Tensor("arg:0", shape=(), dtype=float32).

The above fails because constant_10 has already been constructed outside of the graph that tff.tf_computation constructs internally in the body of add_ten during the serialization process.

On the other hand, invoking python functions that modify the current graph when called inside a tff.tf_computation is fine:

def get_constant_10():
  return tf.constant(10.)

@tff.tf_computation(tf.float32)
def add_ten(x):
  return x + get_constant_10()

add_ten(5.0)
15.0

Note that the serialization mechanisms in TensorFlow are evolving, and we expect the details of how TFF serializes computations to evolve as well.

Working with tf.data.Datasets

As noted earlier, a unique feature of tff.tf_computations is that they allows you to work with tf.data.Datasets defined abstractly as formal parameters by your code. Parameters to be represented in TensorFlow as data sets need to be declared using the tff.SequenceType constructor.

For example, the type specification tff.SequenceType(tf.float32) defines an abstract sequence of float elements in TFF. Sequences can contain either tensors, or complex nested structures (we'll see examples of those later). The concise representation of a sequence of T-typed items is T*.

float32_sequence = tff.SequenceType(tf.float32)

str(float32_sequence)
'float32*'

Suppose that in our temperature sensor example, each sensor holds not just one temperature reading, but multiple. Here's how you can define a TFF computation in TensorFlow that calculates the average of temperatures in a single local data set using the tf.data.Dataset.reduce operator.

@tff.tf_computation(tff.SequenceType(tf.float32))
def get_local_temperature_average(local_temperatures):
  sum_and_count = (
      local_temperatures.reduce((0.0, 0), lambda x, y: (x[0] + y, x[1] + 1)))
  return sum_and_count[0] / tf.cast(sum_and_count[1], tf.float32)
str(get_local_temperature_average.type_signature)
'(float32* -> float32)'

In the body of a method decorated with tff.tf_computation, formal parameters of a TFF sequence type are represented simply as objects that behave like tf.data.Dataset, i.e., support the same properties and methods (they are currently not implemented as subclasses of that type - this may change as the support for data sets in TensorFlow evolves).

You can easily verify this as follows.

@tff.tf_computation(tff.SequenceType(tf.int32))
def foo(x):
  return x.reduce(np.int32(0), lambda x, y: x + y)

foo([1, 2, 3])
6

Keep in mind that unlike ordinary tf.data.Datasets, these dataset-like objects are placeholders. They don't contain any elements, since they represent abstract sequence-typed parameters, to be bound to concrete data when used in a concrete context. Support for abstractly-defined placeholder data sets is still somewhat limited at this point, and in the early days of TFF, you may encounter certain restrictions, but we won't need to worry about them in this tutorial (please refer to the documentation pages for details).

When locally executing a computation that accepts a sequence in a simulation mode, such as in this tutorial, you can feed the sequence as Python list, as below (as well as in other ways, e.g., as a tf.data.Dataset in eager mode, but for now, we'll keep it simple).

get_local_temperature_average([68.5, 70.3, 69.8])
69.53333

Like all other TFF types, sequences like those defined above can use the tff.NamedTupleType constructor to define nested structures. For example, here's how one could declare a computation that accepts a sequence of pairs A, B, and returns the sum of their products. We include the tracing statements in the body of the computation so that you can see how the TFF type signature translates into the dataset's output_types and output_shapes.

@tff.tf_computation(tff.SequenceType(collections.OrderedDict([('A', tf.int32), ('B', tf.int32)])))
def foo(ds):
  print ('output_types = {}, shapes = {}'.format(
      tf.compat.v1.data.get_output_types(ds),
      tf.compat.v1.data.get_output_shapes(ds)))
  return ds.reduce(np.int32(0), lambda total, x: total + x['A'] * x['B'])
output_types = OrderedDict([('A', tf.int32), ('B', tf.int32)]), shapes = OrderedDict([('A', TensorShape([])), ('B', TensorShape([]))])
str(foo.type_signature)
'(<A=int32,B=int32>* -> int32)'
foo([{'A': 2, 'B': 3}, {'A': 4, 'B': 5}])
26

The support for using tf.data.Datasets as formal parameters is still somewhat limited and evolving, although functional in simple scenarios such as those used in this tutorial.

Putting it all together

Now, let's try again to use our TensorFlow computation in a federated setting. Suppose we have a group of sensors that each have a local sequence of temperature readings. We can compute the global temperature average by averaging the sensors' local averages as follows.

@tff.federated_computation(
    tff.FederatedType(tff.SequenceType(tf.float32), tff.CLIENTS))
def get_global_temperature_average(sensor_readings):
  return tff.federated_mean(
      tff.federated_map(get_local_temperature_average, sensor_readings))

Note that this isn't a simple average across all local temperature readings from all clients, as that would require weighing contributions from different clients by the number of readings they locally maintain. We leave it as an exercise for the reader to update the above code; the tff.federated_mean operator accepts the weight as an optional second argument (expected to be a federated float).

Also note that the input to get_global_temperature_average now becomes a federated int sequence. Federated sequences is how we will typically represent on-device data in federated learning, with sequence elements typically representing data batches (you will see examples of this shortly).

str(get_global_temperature_average.type_signature)
'({float32*}@CLIENTS -> float32@SERVER)'

Here's how we can locally execute the computation on a sample of data in Python. Notice that the way we supply the input is now as a list of lists. The outer list iterates over the devices in the group represented by tff.CLIENTS, and the inner ones iterate over elements in each device's local sequence.

get_global_temperature_average([[68.0, 70.0], [71.0], [68.0, 72.0, 70.0]])
70.0

This concludes the first part of the tutorial... we encourage you to continue on to the second part.