Relative deviation learning bounds and generalization with unbounded loss functions

We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting.

[1]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[2]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[3]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[4]  Miroslav Dudík,et al.  Hierarchical maximum entropy density estimation , 2007, ICML '07.

[5]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[6]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[7]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[8]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[9]  Mehryar Mohri,et al.  Tight Lower Bound on the Probability of a Binomial Exceeding its Expectation , 2013, ArXiv.

[10]  E. Rio,et al.  Concentration inequalities, large and moderate deviations for self-normalized empirical processes , 2002 .

[11]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[12]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  D. Pollard Convergence of stochastic processes , 1984 .

[15]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[16]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[17]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[18]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[19]  R. Dudley A course on empirical processes , 1984 .

[20]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[21]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[22]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[23]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[24]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[25]  Savina Andonova Jaeger Generalization Bounds and Complexities Based on Sparsity and Clustering for Convex Combinations of Functions from Random Classes , 2005, J. Mach. Learn. Res..

[26]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[27]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[28]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[29]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[30]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[31]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[32]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[33]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[34]  Peter L. Bartlett,et al.  Local Rademacher complexities and oracle inequalities in risk minimization , 2006 .

[35]  John Shawe-Taylor,et al.  A Result of Vapnik with Applications , 1993, Discret. Appl. Math..

[36]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[37]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[38]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.

[39]  Tze Leung Lai,et al.  Self-Normalized Processes , 2009 .

[40]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[41]  Philip M. Long,et al.  Boosting with Diverse Base Classifiers , 2003, COLT.

[42]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[43]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[44]  D. Panchenko SYMMETRIZATION APPROACH TO CONCENTRATION INEQUALITIES FOR EMPIRICAL PROCESSES , 2003, math/0405354.

[45]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[46]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[47]  D. Jaeschke The Asymptotic Distribution of the Supremum of the Standardized Empirical Distribution Function on Subintervals , 1979 .

[48]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[49]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[50]  W. Lockau,et al.  Contents , 2015 .

[51]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[52]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[53]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[54]  Aarti Singh,et al.  Noise-Adaptive Margin-Based Active Learning and Lower Bounds under Tsybakov Noise Condition , 2014, AAAI.