论文信息 - Concentration Bounds for High Sensitivity Functions Through Differential Privacy

Concentration Bounds for High Sensitivity Functions Through Differential Privacy

A new line of work [Dwork et al. STOC 2015], [Hardt and Ullman FOCS 2014], [Steinke and Ullman COLT 2015], [Bassily et al. STOC 2016] demonstrates how differential privacy [Dwork et al. TCC 2006] can be used as a mathematical tool for guaranteeing generalization in adaptive data analysis. Specifically, if a differentially private analysis is applied on a sample S of i.i.d. examples to select a low-sensitivity function f, then w.h.p. f(S) is close to its expectation, although f is being chosen based on the data. Very recently, Steinke and Ullman observed that these generalization guarantees can be used for proving concentration bounds in the non-adaptive setting, where the low-sensitivity function is fixed beforehand. In particular, they obtain alternative proofs for classical concentration bounds for low-sensitivity functions, such as the Chernoff bound and McDiarmid's Inequality. In this work, we set out to examine the situation for functions with high-sensitivity, for which differential privacy does not imply generalization guarantees under adaptive analysis. We show that differential privacy can be used to prove concentration bounds for such functions in the non-adaptive setting.

Kobbi Nissim | Uri Stemmer

[1] Raef Bassily,et al. Typicality-Based Stability and Privacy , 2016, ArXiv.

[2] Van H. Vu,et al. Concentration of non‐Lipschitz functions and applications , 2002, Random Struct. Algorithms.

[3] Aryeh Kontorovich,et al. Concentration in unbounded metric spaces and algorithmic stability , 2013, ICML.

[4] Toniann Pitassi,et al. Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[5] Aaron Roth,et al. Adaptive Learning with Robust Generalization Guarantees , 2016, COLT.

[6] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[8] Jonathan Ullman,et al. Preventing False Discovery in Interactive Data Analysis Is Hard , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[9] Thomas Steinke,et al. Subgaussian Tail Bounds via Stability Arguments , 2017, ArXiv.

[10] Toniann Pitassi,et al. Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[11] Philip N. Klein,et al. On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms , 2015, SIAM J. Comput..

[12] Moni Naor,et al. Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[13] Kunal Talwar,et al. Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[14] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[15] Raef Bassily,et al. Algorithmic stability for adaptive data analysis , 2015, STOC.

[16] Philip N. Klein,et al. On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms , 1999, SIAM J. Comput..

[17] Van H. Vu,et al. Divide and conquer martingales and the number of triangles in a random graph , 2004, Random Struct. Algorithms.

[18] Thomas Steinke,et al. Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).