Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift

With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, each with a cache of user-generated training examples that remain resident on the local device. These on-device training examples are gathered in situ during the course of users’ interactions with their devices, and thus are highly reflective of at least part of the inference data distribution. Yet a distribution shift may still exist; the on-device training examples may lack for some data inputs expected to be encountered at inference time. This paper proposes a way to mitigate this shift: selective usage of datacenter data, mixed in with FL. By mixing decentralized (federated) and centralized (datacenter) data, we can form an effective training data distribution that better matches the inference data distribution, resulting in more useful models while still meeting the private training data access constraints imposed by FL.

[1]  A. Anonymous,et al.  Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy , 2013, J. Priv. Confidentiality.

[2]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[3]  Swaroop Ramaswamy,et al.  Federated Learning for Emoji Prediction in a Mobile Keyboard , 2019, ArXiv.

[4]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[5]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[6]  H. Brendan McMahan,et al.  Generative Models for Effective ML on Private, Decentralized Datasets , 2019, ICLR.

[7]  Suhas Diggavi,et al.  A Field Guide to Federated Optimization , 2021, ArXiv.

[8]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[10]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.

[11]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[12]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[13]  H. Brendan McMahan,et al.  Training Production Language Models without Memorizing User Data , 2020, ArXiv.

[14]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[15]  Swaroop Ramaswamy,et al.  A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It , 2021, ArXiv.

[16]  Niranjan A. Subrahmanya,et al.  Training Keyword Spotting Models on Non-IID Data with Federated Learning , 2020, INTERSPEECH.

[17]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[18]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[19]  Martin Jaggi,et al.  Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning. , 2020, 2008.03606.