Minimax Rates of Estimating Approximate Differential Privacy

Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With $n$ samples, we show that this estimator achieves performance of a straightforward plug-in estimator with $n \ln n$ samples, a phenomenon referred to as effective sample size amplification. The minimax optimality of the proposed estimator is proved by comparing it to a matching fundamental lower bound.

[1]  S. Bernstein Sur la meilleure approximation de |x| par des polynomes de degrés donnés , 1914 .

[2]  C. Withers Bias reduction by Taylor series , 1987 .

[3]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[4]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[5]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[6]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[7]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[8]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[9]  Benjamin C. Pierce,et al.  Distance makes the types grow stronger: a calculus for differential privacy , 2010, ICFP '10.

[10]  T. Cai,et al.  Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional , 2011, 1105.3039.

[11]  Aaron Roth The Algorithmic Foundations of Data Privacy September 20 , 2011 Lecture 4 , 2011 .

[12]  Sumit Gulwani,et al.  Proving programs robust , 2011, ESEC/FSE '11.

[13]  Universally Utility-maximizing Privacy Mechanisms , 2012, SIAM J. Comput..

[14]  Gilles Barthe,et al.  Probabilistic Relational Reasoning for Differential Privacy , 2012, TOPL.

[15]  H. Mhaskar,et al.  Applications of classical approximation theory to periodic basis function networks and computational harmonic analysis , 2013 .

[16]  Sofya Raskhodnikova,et al.  Testing the Lipschitz Property over Product Distributions with Applications to Data Privacy , 2013, TCC.

[17]  Chris Clifton,et al.  Top-k frequent itemsets via differentially private FP-trees , 2014, KDD.

[18]  Ashwin Machanavajjhala,et al.  Differentially Private Algorithms for Empirical Machine Learning , 2014, ArXiv.

[19]  Yanjun Han,et al.  Minimax Estimation of Discrete Distributions under ℓ1 Loss , 2014, ArXiv.

[20]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[21]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[22]  Ashwin Machanavajjhala,et al.  Pufferfish , 2014, ACM Trans. Database Syst..

[23]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[24]  Ashwin Machanavajjhala,et al.  On the Privacy Properties of Variants on the Sparse Vector Technique , 2015, ArXiv.

[25]  Pramod Viswanath,et al.  The Staircase Mechanism in Differential Privacy , 2015, IEEE Journal of Selected Topics in Signal Processing.

[26]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[27]  Pramod Viswanath,et al.  Secure Multi-party Differential Privacy , 2015, NIPS.

[28]  Yanjun Han,et al.  Minimax Estimation of Discrete Distributions Under $\ell _{1}$ Loss , 2014, IEEE Transactions on Information Theory.

[29]  Yanjun Han,et al.  Minimax estimation of the L1 distance , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[30]  Yanjun Han,et al.  Minimax Estimation of KL Divergence between Discrete Distributions , 2016, ArXiv.

[31]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[32]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[33]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[34]  Sreeram Kannan,et al.  Estimating Mutual Information for Discrete-Continuous Mixtures , 2017, NIPS.

[35]  Himanshu Tyagi,et al.  Estimating Renyi Entropy of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[36]  Yizhen Wang,et al.  Pufferfish Privacy Mechanisms for Correlated Data , 2016, SIGMOD Conference.

[37]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[38]  Ninghui Li,et al.  Understanding the Sparse Vector Technique for Differential Privacy , 2016, Proc. VLDB Endow..

[39]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[40]  Gregory Valiant,et al.  Estimating the Unseen , 2017, J. ACM.

[41]  Olivier Bachem,et al.  Assessing Generative Models via Precision and Recall , 2018, NeurIPS.

[42]  Demystifying Fixed k-Nearest Neighbor Information Estimators , 2018, IEEE Trans. Inf. Theory.

[43]  Anna C. Gilbert,et al.  Property Testing For Differential Privacy , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[44]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[45]  Pramod Viswanath,et al.  Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation , 2016, IEEE Transactions on Information Theory.

[46]  Ram Rajagopal,et al.  Generative Adversarial Privacy: A Data-Driven Approach to Information-Theoretic Privacy , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[47]  Constantinos Daskalakis,et al.  Which Distribution Distances are Sublinearly Testable? , 2017, Electron. Colloquium Comput. Complex..

[48]  Yanjun Han,et al.  The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal , 2017, NeurIPS.

[49]  Yanjun Han,et al.  Minimax Estimation of the $L_{1}$ Distance , 2018, IEEE Transactions on Information Theory.

[50]  Danfeng Zhang,et al.  Detecting Violations of Differential Privacy , 2018, CCS.

[51]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[52]  Danfeng Zhang,et al.  Proving differential privacy with shadow execution , 2019, PLDI.

[53]  Yihong Wu,et al.  Chebyshev polynomials, moment matching, and optimal estimation of the unseen , 2015, The Annals of Statistics.

[54]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.