Dynamic Differential Privacy for ADMM-Based Distributed Classification Learning

Privacy-preserving distributed machine learning becomes increasingly important due to the recent rapid growth of data. This paper focuses on a class of regularized empirical risk minimization machine learning problems, and develops two methods to provide differential privacy to distributed learning algorithms over a network. We first decentralize the learning algorithm using the alternating direction method of multipliers, and propose the methods of dual variable perturbation and primal variable perturbation to provide dynamic differential privacy. The two mechanisms lead to algorithms that can provide privacy guarantees under mild conditions of the convexity and differentiability of the loss function and the regularizer. We study the performance of the algorithms, and show that the dual variable perturbation outperforms its primal counterpart. To design an optimal privacy mechanism, we analyze the fundamental tradeoff between privacy and accuracy, and provide guidelines to choose privacy parameters. Numerical experiments using customer information database are performed to corroborate the results on privacy and utility tradeoffs and design.

[1]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[4]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[5]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[6]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[7]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[8]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[9]  William E. Winkler,et al.  Multiplicative Noise for Masking Continuous Data , 2001 .

[10]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[11]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[13]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[14]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[15]  Georgios B. Giannakis,et al.  Consensus-based distributed linear support vector machines , 2010, IPSN '10.

[16]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[18]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[19]  Yu-Hong Dai,et al.  A perfect example for the BFGS method , 2013, Math. Program..

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Matteo Maffei,et al.  Differential Privacy by Typing in Security Protocols , 2013, 2013 IEEE 26th Computer Security Foundations Symposium.

[22]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[23]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[24]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[25]  Peter Chilstrom Singular Value Inequalities: New Approaches to Conjectures , 2013 .