Chromatic PAC-Bayes Bounds for Non-IID Data

Pac-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifiers (Germain et al., 2009). However, there are many practical situations where the training data show some dependencies and where the traditional IID assumption does not hold. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first - to the best of our knowledge - Pac-Bayes generalization bounds for classifiers trained on data exhibiting interdependencies. The approach undertaken to establish our results is based on the decomposition of a so-called dependency graph that encodes the dependencies within the data, in sets of independent data, thanks to graph fractional covers. Our bounds are very general, since being able to find an upper bound on the fractional chromatic number of the dependency graph is sufficient to get new Pac-Bayes bounds for specific settings. We show how our results can be used to derive bounds for ranking statistics (such as Auc) and classifiers trained on data distributed according to a stationary {\ss}-mixing process. In the way, we show how our approach seemlessly allows us to deal with U-processes. As a side note, we also provide a Pac-Bayes generalization bound for classifiers learned on data from stationary $\varphi$-mixing distributions.

[1]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[4]  M. Seeger The Proof of McAllester ’ s PAC-Bayesian Theorem , 2002 .

[5]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[6]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[7]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[8]  Jean-Yves Audibert,et al.  Combining PAC-Bayesian and Generic Chaining Bounds , 2007, J. Mach. Learn. Res..

[9]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[10]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[11]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[12]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[13]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[14]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[15]  K. Ramanan,et al.  Concentration Inequalities for Dependent Random Variables via the Martingale Method , 2006, math/0609835.

[16]  Massih-Reza Amini,et al.  Generalization error bounds for classifiers trained with interdependent data , 2005, NIPS.

[17]  P. Gallinari,et al.  A Data-dependent Generalisation Error Bound for the AUC , 2005 .

[18]  Shivani Agarwal,et al.  Generalization Bounds for Ranking Algorithms via Algorithmic Stability , 2009, J. Mach. Learn. Res..

[19]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[20]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[21]  Gilles Blanchard,et al.  Occam's Hammer , 2006, COLT.

[22]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[23]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[24]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[25]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[26]  William Nick Street,et al.  Learning to Rank by Maximizing AUC with Linear Programming , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[27]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[28]  Svante Janson,et al.  Large deviations for sums of partly dependent random variables , 2004, Random Struct. Algorithms.

[29]  Sriram V. Pemmaraju,et al.  Equitable colorings extend Chernoff-Hoeffding bounds , 2001, SODA '01.

[30]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[31]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[32]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[33]  Svante Janson,et al.  Large deviations for sums of partly dependent random variables , 2004 .

[34]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[35]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[36]  E. Scheinerman,et al.  Fractional Graph Theory: A Rational Approach to the Theory of Graphs , 1997 .

[37]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[38]  John Shawe-Taylor,et al.  Tighter PAC-Bayes Bounds , 2006, NIPS.