Per-instance Differential Privacy and the Adaptivity of Posterior Sampling in Linear and Ridge regression

Differential privacy (DP), ever since its advent, has been a controversial object. On the one hand, it provides strong provable protection of individuals in a data set, on the other hand, it has been heavily criticized for being not practical, partially due to its complete independence to the actual data set it tries to protect. In this paper, we address this issue by a new and more fine-grained notion of differential privacy --- per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. We show that this is a strict generalization of the standard DP and inherits all its desirable properties, e.g., composition, invariance to side information and closedness to postprocessing, except that they all hold for every instance separately. When the data is drawn from a distribution, we show that per-instance DP implies generalization. Moreover, we provide explicit calculations of the per-instance DP for the output perturbation on a class of smooth learning problems. The result reveals an interesting and intuitive fact that an individual has stronger privacy if he/she has small "leverage score" with respect to the data set and if he/she can be predicted more accurately using the leave-one-out data set. Using the developed techniques, we provide a novel analysis of the One-Posterior-Sample (OPS) estimator and show that when the data set is well-conditioned it provides $(\epsilon,\delta)$-pDP for any target individuals and matches the exact lower bound up to a $1+\tilde{O}(n^{-1}\epsilon^{-2})$ multiplicative factor. We also propose AdaOPS which uses adaptive regularization to achieve the same results with $(\epsilon,\delta)$-DP. Simulation shows several orders-of-magnitude more favorable privacy and utility trade-off when we consider the privacy of only the users in the data set.

[1]  Nikhil Srivastava,et al.  Graph Sparsification by Effective Resistances , 2011, SIAM J. Comput..

[2]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[3]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[4]  James R. Foulds,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016, UAI.

[5]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[6]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[7]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[8]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[9]  Or Sheffet Differentially Private Ordinary Least Squares: $t$-Values, Confidence Intervals and Rejecting Null-Hypotheses , 2015 .

[10]  Avrim Blum,et al.  The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[11]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[12]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[13]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[14]  Stephen E. Fienberg,et al.  On-Average KL-Privacy and Its Equivalence to Generalization for Max-Entropy Mechanisms , 2016, PSD.

[15]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[16]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[17]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  John C. Duchi,et al.  Privacy and Statistical Risk: Formalisms and Minimax Bounds , 2014, ArXiv.

[20]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[21]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[22]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[23]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[24]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[25]  Stephen E. Fienberg,et al.  Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency Tables , 2010, Privacy in Statistical Databases.

[26]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[27]  Alexander J. Smola,et al.  Fast Differentially Private Matrix Factorization , 2015, RecSys.

[28]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[29]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[30]  David Sands,et al.  Differential Privacy , 2015, POPL.

[31]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[32]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[33]  Michael I. Jordan,et al.  Matrix concentration inequalities via the method of exchangeable pairs , 2012, 1201.6002.