Notes from the 8/11/2022 meeting of TFF collaborators

  • Proposed agenda topic: Jeremy Lewi will present his TFF-based ideas for new components that could be built
  • [JL] Focusing on simple federated analytics scenarios, connecting TFF with Google sheets to do simple fed averaging. Working in Kubernetes, reading from sheets.
  • [JL] One challenge is that currently workers are required to have ingress points.
    • This is often not the case, so need a transport layer that enables connectivity to be established in the opposite direction, workers calling a server.
    • Such component is not currently in the ecosystem.
  • [BC] Also saw the need for this. Currently using TFF in a limited fashion, in-house cloud where clients upload data. But, would need something like JL described above to migrate to a multi-datacenter setting.
  • [JL] Thinking of a layer that would enable workers to “pull” work items from a queue on a server - should it replace the existing runtime.
  • [KO] Don’t have to think of this in terms of “replacing” - you can keep the computation authoring and 98% of the runtime the same, and you’d just swap in the new component that works the way you propose instead off the remote executor as a mechanism for relaying executor requests top down.
  • [BC] Would you need it to be async, or would it work within the existing sync paradigm.
  • [BC] Also, some of the exiting platforms do use a “queue of tasks” approach, so this sounds like an established idea.
  • [BC] Introducing timeouts would also perhaps help to bridge the gap (to deal with slow workers or stragglers).
  • [KO] With respect to sync vs. async, we have collective abstractions in TFF that require the notion of a “cohort”. As such, there needs to be a time when some of the clients out there decide together to join a “cohort”, and the server would need to play a role in orchestrating this to happen. As long that’s done, the manner in which individual executor requests are relayed to clients can then vary. Remote executor that calls top-down is one way to go about it, but not the only one; a work items-based communication pattern like what was proposed above could also definitely fit into this structure. Seems like material for a small one-two pager proposal for someone to draft?
  • [JL] Volunteering to write down a proposal for a new component for us all to iterate on.
  • [JL] BTW, are there other adjacent repos with related functionality?
  • [KO] FYI, https://github.com/google/federated-compute also from Google, but that’s mostly focusing on a mobile scenario, it’s not connected to TFF at this point, and doesn’t contain the functionality you’re describing here, so it definitely makes sense to try and formulate a small proposal in this group.
  • [BD] Some questions to address: caching results, when to aggregate.
  • [Hao] Perhaps don’t need caching in this scenario if it’s not async
  • [KO] For scenarios that fit a simple MapReduce pattern, we do have some support in TFF, see https://www.tensorflow.org/federated/api_docs/python/tff/backends/mapreduce. This library enables you to translate TFF computations into a MapReduce-like form that you could execute on a simpler platform. However, there’s some loss in expressiveness, and some of the ideas discussed earlier that required multiple rounds of back-and-forth communication between sevrr and clients wouldn’t be expressible in this framework. And, the cross-silo setting uniquely makes those types of ideas possible, since we’re dealing with groups of well-provisioned clients (silos) that can maintain long-lasting connections.
  • [Hao] What about collective ops, allreduce - are those supported or compatible
  • [KO] Not currently. Allreduce would have somewhat limited use, in that while it could be leveraged in a single fed avg scenario, it assumes no work is happening on the server in between rounds of processing. Won’t work in more general cases. But, having the two halves of it - efficient mode of broadcasting and efficient mode of aggregating, perhaps even with hardware acceleration, would be something we can take advantage of in TFF.
  • [KO] Sounds like JL is up for kicking off a draft of a proposal for a new component, and others have opinions for what should be in it - let’s collaborate (+1 from all in the room). To reconvene in 2 weeks, possibly with a draft to discuss.