Private Weighted Random Walk Stochastic Gradient Descent

We consider a decentralized learning setting in which data is distributed over nodes in a graph. The goal is to learn a global model on the distributed data without involving any central entity that needs to be trusted. While gossip-based stochastic gradient descent (SGD) can be used to achieve this learning objective, it incurs high communication and computation costs. To speed up the convergence, we propose instead to study random walk based SGD in which a global model is updated based on a random walk on the graph. We propose two algorithms based on two types of random walks that achieve, in a decentralized way, uniform sampling and importance sampling of the data. We provide a non-asymptotic analysis on the rate of convergence, taking into account the constants related to the data and the graph. Our numerical results show that the weighted random walk based algorithm has a better performance for high-variance data. Moreover, we propose a privacy-preserving random walk algorithm that achieves local differential privacy based on a Gamma noise mechanism that we propose. We also give numerical results on the convergence of this algorithm and show that it outperforms additive Laplace-based privacy mechanisms.

[1]  Ayfer Özgür,et al.  Breaking the Communication-Privacy-Accuracy Trilemma , 2020, IEEE Transactions on Information Theory.

[2]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[3]  Raj Kumar Maity,et al.  vqSGD: Vector Quantized Stochastic Gradient Descent , 2019, IEEE Transactions on Information Theory.

[4]  Salim El Rouayheb,et al.  Random Walk Gradient Descent for Decentralized Learning on Graphs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Andreas Krause,et al.  Online Variance Reduction with Mixtures , 2019, ICML.

[6]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[7]  Wotao Yin,et al.  On Markov Chain Gradient Descent , 2018, NeurIPS.

[8]  Ali H. Sayed,et al.  Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization , 2018, IEEE Transactions on Signal Processing.

[9]  Anna Scaglione,et al.  SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[10]  Wei Shi,et al.  Curvature-aided incremental aggregated gradient method , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Jeffrey F. Naughton,et al.  Differentially Private Stochastic Gradient Descent for in-RDBMS Analytics , 2016, ArXiv.

[12]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[13]  Anand D. Sarwate,et al.  Randomized requantization with local differential privacy , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Yoshua Bengio,et al.  Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.

[15]  Guillaume Bouchard,et al.  Online Learning to Sample , 2015, 1506.09016.

[16]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[17]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[18]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[19]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[20]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[21]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[22]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Xin Xu,et al.  Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling , 2012, SIGMETRICS '12.

[24]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[25]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[26]  Michael I. Jordan,et al.  Ergodic mirror descent , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Mikael Johansson,et al.  A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..

[28]  Anand D. Sarwate,et al.  Broadcast Gossip Algorithms for Consensus , 2009, IEEE Transactions on Signal Processing.

[29]  Angelia Nedic,et al.  Asynchronous gossip algorithms for stochastic optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[30]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[31]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[32]  Mikael Johansson,et al.  A simple peer-to-peer algorithm for distributed optimization in sensor networks , 2007, 2007 46th IEEE Conference on Decision and Control.

[33]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[34]  M. Tousignant,et al.  Available , 1984 .

[35]  H. Robbins A Stochastic Approximation Method , 1951 .

[36]  Anshumali Shrivastava,et al.  Fast and Accurate Stochastic Gradient Estimation , 2019, NeurIPS.

[37]  V. Climenhaga Markov chains and mixing times , 2013 .

[38]  Annie I-An Chen,et al.  Fast Distributed First-Order Methods , 2012 .

[39]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[40]  Jon C. Dattorro,et al.  Convex Optimization & Euclidean Distance Geometry , 2004 .

[41]  Athanasios Papoulis,et al.  Probability, random variables, and stochastic processes , 2002 .

[42]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[43]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[44]  John C. Duchi,et al.  Ieee Transactions on Automatic Control 1 Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2022 .