On Sparse Linear Regression in the Local Differential Privacy Model

In this paper, we study the sparse linear regression problem under the Local Differential Privacy (LDP) model. We first show that polynomial dependency on the dimensionality <inline-formula> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> of the space is unavoidable for the estimation error in both non-interactive and sequential interactive local models, if the privacy of the whole dataset needs to be preserved. Similar limitations also exist for other types of error measurements and in the relaxed local models. This indicates that differential privacy in high dimensional space is unlikely achievable for the problem. With the understanding of this limitation, we then present two algorithmic results. The first one is a sequential interactive LDP algorithm for the low dimensional sparse case, called Locally Differentially Private Iterative Hard Thresholding (LDP-IHT), which achieves a near optimal upper bound. This algorithm is actually rather general and can be used to solve quite a few other problems, such as (Local) DP-ERM with sparsity constraints and sparse regression with non-linear measurements. The second one is for the restricted (high dimensional) case where only the privacy of the responses (labels) needs to be preserved. For this case, we show that the optimal rate of the error estimation can be made logarithmically dependent on <inline-formula> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> (i.e., <inline-formula> <tex-math notation="LaTeX">$\log p$ </tex-math></inline-formula>) in the local model, where an upper bound is obtained by a label-privacy version of LDP-IHT. Experiments on real world and synthetic datasets confirm our theoretical analysis.

[1]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[2]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[3]  Di Wang,et al.  Lower Bound of Locally Differentially Private Sparse Covariance Matrix Estimation , 2019, IJCAI.

[4]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[5]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[6]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Ashwin Machanavajjhala,et al.  Differentially Private Regression Diagnostics , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[8]  P. Bůžková Linear Regression in Genetic Association Studies , 2013, PloS one.

[9]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[10]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[11]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[12]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[13]  Feng Ruan,et al.  The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information , 2018, ArXiv.

[14]  Chenglin Miao,et al.  Pairwise Learning with Differential Privacy Guarantees , 2020, AAAI.

[15]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[16]  Li Zhang,et al.  Nearly Optimal Private LASSO , 2015, NIPS.

[17]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[18]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[19]  Yu-Xiang Wang,et al.  Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain , 2018, UAI.

[20]  Huanyu Zhang,et al.  Differentially Private Assouad, Fano, and Le Cam , 2020, ALT.

[21]  Himanshu Tyagi,et al.  Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.

[22]  H. Rauhut Compressive Sensing and Structured Random Matrices , 2009 .

[23]  Han Liu,et al.  Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems , 2018, AISTATS.

[24]  Seth Neel,et al.  The Role of Interactivity in Local Differential Privacy , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[25]  Di Wang,et al.  Differentially Private Empirical Risk Minimization Revisited: Faster and More General , 2018, NIPS.

[26]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[27]  Di Wang,et al.  Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited , 2018, NeurIPS.

[28]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[29]  Ashwin Machanavajjhala,et al.  Differentially Private Significance Tests for Regression Coefficients , 2017, Journal of Computational and Graphical Statistics.

[30]  Liwei Wang,et al.  Collect at Once, Use Effectively: Making Non-interactive Locally Private Learning Possible , 2017, ICML.

[31]  Massimo Fornasier,et al.  Theoretical Foundations and Numerical Methods for Sparse Recovery , 2010, Radon Series on Computational and Applied Mathematics.

[32]  Leonard A. Marascuilo,et al.  Statistical methods for the social and behavioral sciences , 1990 .

[33]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[34]  Adam D. Smith,et al.  Is Interaction Necessary for Distributed Private Learning? , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[35]  Li Zhang,et al.  Private Empirical Risk Minimization Beyond the Worst Case: The Effect of the Constraint Set Geometry , 2014, ArXiv.

[36]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[37]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[38]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[39]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[40]  Bhiksha Raj,et al.  Greedy sparsity-constrained optimization , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[41]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[42]  Daniel Sheldon,et al.  Differentially Private Bayesian Linear Regression , 2019, NeurIPS.

[43]  Zhuoran Yang,et al.  Nonlinear Structured Signal Estimation in High Dimensions via Iterative Hard Thresholding , 2018, AISTATS.

[44]  Yonina C. Eldar,et al.  Sparse Nonlinear Regression: Parameter Estimation under Nonconvexity , 2016, ICML.

[45]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[46]  Joseph P. Near,et al.  Differential Privacy at Scale: Uber and Berkeley Collaboration , 2018 .

[47]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[48]  Anna C. Gilbert,et al.  Local differential privacy for physical sensor data and sparse recovery , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[49]  Di Wang,et al.  Differentially Private Empirical Risk Minimization with Smooth Non-Convex Loss Functions: A Non-Stationary View , 2019, AAAI.

[50]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[51]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[52]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[53]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[54]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[55]  Di Wang,et al.  Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations , 2019, ALT.