Calibrating Noise to Variance in Adaptive Data Analysis

Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data distribution where each query may depend arbitrarily on answers to previous queries. The strongest results obtained for this problem rely on differential privacy -- a strong notion of algorithmic stability with the important property that it "composes" well when data is reused. However the notion is rather strict, as it requires stability under replacement of an arbitrary data element. The simplest algorithm is to add Gaussian (or Laplace) noise to distort the empirical answers. However, analysing this technique using differential privacy yields suboptimal accuracy guarantees when the queries have low variance. Here we propose a relaxed notion of stability that also composes adaptively. We demonstrate that a simple and natural algorithm based on adding noise scaled to the standard deviation of the query provides our notion of stability. This implies an algorithm that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries. The only previous approach that provides such accuracy guarantees is based on a more involved differentially private median-of-means algorithm and its analysis exploits stronger "group" stability of the algorithm.

[1]  Raef Bassily,et al.  Learners that Use Little Information , 2017, ALT.

[2]  Thomas Steinke,et al.  Generalization for Adaptively-chosen Estimators via Stable Median , 2017, COLT.

[3]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[4]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[5]  Maxim Raginsky,et al.  Information-theoretic analysis of stability and bias of learning algorithms , 2016, 2016 IEEE Information Theory Workshop (ITW).

[6]  Stephen E. Fienberg,et al.  On-Average KL-Privacy and Its Equivalence to Generalization for Max-Entropy Mechanisms , 2016, PSD.

[7]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[8]  Aaron Roth,et al.  Max-Information, Differential Privacy, and Post-selection Hypothesis Testing , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[9]  Raef Bassily,et al.  Typicality-Based Stability and Privacy , 2016, ArXiv.

[10]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[11]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[12]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[13]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[14]  Thomas Steinke,et al.  Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).

[15]  Jonathan Ullman,et al.  Preventing False Discovery in Interactive Data Analysis Is Hard , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[16]  Fady Alajaji,et al.  Rényi divergence measures for commonly used univariate continuous distributions , 2013, Inf. Sci..

[17]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.

[18]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[19]  Maya R. Gupta,et al.  Functional Bregman Divergence and Bayesian Estimation of Distributions , 2006, IEEE Transactions on Information Theory.

[20]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[21]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[22]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[23]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[24]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[25]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[26]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[27]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[28]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[29]  Thomas Steinke,et al.  Adaptive Data Analysis , 2016 .

[30]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[31]  J. A. Salvato John wiley & sons. , 1994, Environmental science & technology.

[32]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[33]  R. Gray Entropy and Information Theory , 1990, Springer New York.