Dependence measures bounding the exploration bias for general measurements

We propose a framework to analyze and quantify the bias in adaptive data analysis. It generalizes that proposed by Russo and Zou'15, applying to measurements whose moment generating function exists, measurements with a finite p-norm, and measurements in general Orlicz spaces. We introduce a new class of dependence measures which retain key properties of mutual information while more effectively quantifying the exploration bias for heavy tailed distributions. We provide examples of cases where our bounds are nearly tight in situations where the original framework of Russo and Zou'15 does not apply.

[1]  Jacob Ziv,et al.  On functionals satisfying a data-processing theorem , 1973, IEEE Trans. Inf. Theory.

[2]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[3]  Sergio Verdú,et al.  Cumulant generating function of codeword lengths in optimal lossless compression , 2014, 2014 IEEE International Symposium on Information Theory.

[4]  Tsachy Weissman,et al.  Information Measures: The Curious Case of the Binary Alphabet , 2014, IEEE Transactions on Information Theory.

[5]  Amos Lapidoth,et al.  Two Measures of Dependence , 2016, 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE).

[6]  Mark D. Reid,et al.  Tighter Variational Representations of f-Divergences via Restriction to Probability Measures , 2012, ICML.

[7]  R. Sibson Information radius , 1969 .

[8]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[9]  Peter E. Latham,et al.  Mutual Information , 2006 .

[10]  Lech Maligranda,et al.  Amemiya norm equals Orlicz norm in general , 2000 .

[11]  Paul Dupuis,et al.  Robust Bounds on Risk-Sensitive Functionals via Rényi Divergence , 2013, SIAM/ASA J. Uncertain. Quantification.

[12]  L. Schmetterer Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. , 1963 .

[13]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[14]  Gustavo L. Gilardoni On a Gel'fand-Yaglom-Peres theorem for f-divergences , 2009, ArXiv.

[15]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[16]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.