Enhancing the Privacy of Federated Learning with Sketching

In response to growing concerns about user privacy, federated learning has emerged as a promising tool to train statistical models over networks of devices while keeping data localized. Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties. However, current methods still share model updates, which may contain private information (e.g., one's weight and height), during the training process. Existing efforts that aim to improve the privacy of federated learning make compromises in one or more of the following key areas: performance (particularly communication cost), accuracy, or privacy. To better optimize these trade-offs, we propose that \textit{sketching algorithms} have a unique advantage in that they can provide both privacy and performance benefits while maintaining accuracy. We evaluate the feasibility of sketching-based federated learning with a prototype on three representative learning models. Our initial findings show that it is possible to provide strong privacy guarantees for federated learning without sacrificing performance or accuracy. Our work highlights that there exists a fundamental connection between privacy and communication in distributed settings, and suggests important open problems surrounding the theoretical understanding, methodology, and system design of practical, private federated learning.

[1]  Vyas Sekar,et al.  Privacy for Free: Communication-Efficient Learning with Differential Privacy Using Sketches , 2019, ArXiv.

[2]  Yan Chen,et al.  Reversible sketches for efficient and accurate change detection over network data streams , 2004, IMC '04.

[3]  Tong Yang,et al.  SketchML: Accelerating Distributed Machine Learning with Data Sketches , 2018, SIGMOD Conference.

[4]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[5]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[6]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[7]  Emiliano De Cristofaro,et al.  Efficient Private Statistics with Succinct Sketches , 2015, NDSS.

[8]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[9]  Graham Cormode,et al.  A near-optimal algorithm for estimating the entropy of a stream , 2010, TALG.

[10]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[11]  Philip S. Yu,et al.  On Privacy-Preservation of Text and Sparse Binary Data with Sketches , 2007, SDM.

[12]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[13]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[14]  Paul Gaynor,et al.  Distributed face recognition using collaborative judgement aggregation in a swarm of tiny wireless sensor nodes , 2015, SoutheastCon 2015.

[15]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[16]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[19]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[20]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[21]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[22]  Srinivas Devadas,et al.  Intel SGX Explained , 2016, IACR Cryptol. ePrint Arch..

[23]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[24]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[25]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[26]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[27]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[28]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[29]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[30]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[31]  Vinod Vaikuntanathan,et al.  Fully Homomorphic Encryption from Ring-LWE and Security for Key Dependent Messages , 2011, CRYPTO.

[32]  Vyas Sekar,et al.  An empirical evaluation of entropy-based traffic anomaly detection , 2008, IMC '08.

[33]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[34]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.

[35]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[36]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.