Robust Decentralized Differentially Private Stochastic Gradient Descent

Stochastic gradient descent (SGD) is one of the most applied machine learning algorithms in unreliable large-scale decentralized environments. In this type of environment data privacy is a fundamental concern. The most popular way to investigate this topic is based on the framework of differential privacy. However, many important implementation details and the performance of differentially private SGD variants have not yet been completely addressed. Here, we analyze a set of distributed differentially private SGD implementations in a system, where every private data record is stored separately by an autonomous node. The examined SGD methods apply only local computations and communications contain only protected information in a differentially private manner. A key middleware service these implementations require is the single random walk service, where a single random walk is maintained in the face of different failure scenarios. First we propose a robust implementation for the decentralized single random walk service and then perform experiments to evaluate the proposed random walk service as well as the private SGD implementations. Our main conclusion here is that the proposed differentially private SGD implementations can approximate the performance of their original noise-free variants in faulty decentralized environments, provided the algorithm parameters are set properly.

[1]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Márk Jelasity,et al.  Defining and understanding smartphone churn over the internet: A measurement study , 2014, 14-th IEEE International Conference on Peer-to-Peer Computing.

[6]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[7]  István Hegedüs,et al.  Gossip learning with linear models on fully distributed data , 2011, Concurr. Comput. Pract. Exp..

[8]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[9]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[10]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[11]  István Hegedüs,et al.  Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[12]  Anne-Marie Kermarrec,et al.  Peer to peer size estimation in large and dynamic networks: A comparative study , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[15]  Márk Jelasity,et al.  Adaptive Peer Sampling with Newscast , 2009, Euro-Par.

[16]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[17]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[18]  Alex Pentland,et al.  Society's Nervous System: Building Effective Government, Energy, and Public Health Systems , 2012, Computer.

[19]  Márk Jelasity,et al.  PeerSim: A scalable P2P simulator , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[20]  Laurence T. Yang,et al.  Data Mining for Internet of Things: A Survey , 2014, IEEE Communications Surveys & Tutorials.

[21]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[22]  Márk Jelasity,et al.  Building a Secure and Privacy-Preserving Smart Grid , 2015, OPSR.

[23]  Amos Israeli,et al.  Uniform Dynamic Self-Stabilizing Leader Election , 1997, IEEE Trans. Parallel Distributed Syst..

[24]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[25]  Walter Willinger,et al.  On Unbiased Sampling for Unstructured Peer-to-Peer Networks , 2006, IEEE/ACM Transactions on Networking.

[26]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[27]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[28]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[29]  George Danezis,et al.  Privacy-preserving smart metering , 2011, ISSE.