The structure of optimal private tests for simple hypotheses

Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple hypotheses: given two distributions P and Q, and a privacy level ε, how many i.i.d. samples are needed to distinguish P from Q subject to ε-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level ε, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman-Pearson lemma in the setting of private hypothesis testing. We also give an application of our result to the private change-point detection. Our characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.

[1]  Adam Groce,et al.  Differentially Private Nonparametric Hypothesis Testing , 2019, CCS.

[2]  Andrew Bray,et al.  Improved Differentially Private Analysis of Variance , 2019, Proc. Priv. Enhancing Technol..

[3]  Daniel Kifer,et al.  Statistical Approximating Distributions Under Differential Privacy , 2018, J. Priv. Confidentiality.

[4]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[5]  Yajun Mei,et al.  Differentially Private Change-Point Detection , 2018, NeurIPS.

[6]  Borja Balle,et al.  Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , 2018, NeurIPS.

[7]  Feng Ruan,et al.  The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information , 2018, ArXiv.

[8]  Aleksandra B. Slavkovic,et al.  Differentially Private Uniformly Most Powerful Tests for Binomial Data , 2018, NeurIPS.

[9]  Huanyu Zhang,et al.  INSPECTRE: Privately Estimating the Unseen , 2018, ICML.

[10]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[11]  V. Feldman,et al.  Calibrating Noise to Variance in Adaptive Data Analysis , 2017, COLT.

[12]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[13]  Anna M. Ritz,et al.  Differentially Private ANOVA Testing , 2017, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[14]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[15]  Ronitt Rubinfeld,et al.  Differentially Private Identity and Closeness Testing of Discrete Distributions , 2017, ArXiv.

[16]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[17]  Jun Sakuma,et al.  Differentially Private Chi-squared Test by Unit Circle Mechanism , 2017, ICML.

[18]  Thomas Steinke,et al.  Generalization for Adaptively-chosen Estimators via Stable Median , 2017, COLT.

[19]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[20]  Constantinos Daskalakis,et al.  Priv'IT: Private and Sample Efficient Identity Testing , 2017, ICML.

[21]  Aleksandar Nikolov,et al.  Lower Bounds for Differential Privacy from Gaussian Width , 2016, SoCG.

[22]  Daniel Kifer,et al.  A New Class of Private Chi-Square Tests , 2016, ArXiv.

[23]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[24]  Aleksandar Nikolov,et al.  The Geometry of Differential Privacy: The Small Database and Approximate Cases , 2016, SIAM J. Comput..

[25]  Aaron Roth,et al.  Max-Information, Differential Privacy, and Post-selection Hypothesis Testing , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[26]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[27]  Aaron Roth,et al.  Adaptive Learning with Robust Generalization Guarantees , 2016, COLT.

[28]  Stephen E. Fienberg,et al.  A Minimax Theory for Adaptive Data Analysis , 2016, ArXiv.

[29]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[30]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[31]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[32]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[33]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[34]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[35]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[36]  Aleksandar Nikolov,et al.  An Improved Private Mechanism for Small Databases , 2015, ICALP.

[37]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[38]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[39]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[40]  Akshay Balsubramani Sharp Finite-Time Iterated-Logarithm Martingale Concentration , 2014 .

[41]  Vitaly Feldman,et al.  Sample Complexity Bounds on Differentially Private Learning via Communication Complexity , 2014, SIAM J. Comput..

[42]  Jonathan Ullman,et al.  Fingerprinting codes and the price of approximate differential privacy , 2013, STOC.

[43]  Michael I. Jordan,et al.  Local Privacy and Statistical Minimax Rates , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[44]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[45]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[46]  Taposh Banerjee,et al.  Quickest Change Detection , 2012, ArXiv.

[47]  Aditya Bhaskara,et al.  Unconditional differentially private mechanisms for linear queries , 2012, STOC '12.

[48]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[49]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[50]  Kobbi Nissim,et al.  Impossibility of Differentially Private Universally Optimal Mechanisms , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[51]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[52]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[53]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[54]  Adam D. Smith,et al.  A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information , 2008, IACR Cryptol. ePrint Arch..

[55]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[56]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[57]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[58]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[59]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[60]  Y. Mei Sequential change-point detection when unknown parameters are present in the pre-change distribution , 2006, math/0605322.

[61]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[62]  T. Lai Sequential changepoint detection in quality control and dynamical systems , 1995 .

[63]  M. Pollak Average Run Lengths of an Optimal Method of Detecting a Change in Distribution. , 1987 .

[64]  G. Moustakides Optimal stopping times for detecting changes in distributions , 1986 .

[65]  M. Pollak Optimal Detection of a Change in Distribution , 1985 .

[66]  G. Lorden PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION , 1971 .

[67]  E. S. Page A test for a change in a parameter occurring at an unknown point , 1955 .

[68]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[69]  A. R. Crathorne,et al.  Economic Control of Quality of Manufactured Product. , 1933 .

[70]  C. Papadimitriou,et al.  The complexity of massive data set computations , 2002 .

[71]  M. Kulldor,et al.  Prospective time-periodic geographical disease surveillance using a scan statistic , 2001 .

[72]  Martin Kulldorff,et al.  Prospective time periodic geographical disease surveillance using a scan statistic , 2001 .

[73]  A. Shiryaev On Optimum Methods in Quickest Detection Problems , 1963 .

[74]  R. F.,et al.  Mathematical Statistics , 1944, Nature.