Risk upper bounds for general ensemble methods with an application to multiclass classification

This paper generalizes a pivotal result from the PAC-Bayesian literature -the C - bound - primarily designed for binary classification to the general case of ensemble methods of voters with arbitrary outputs. We provide a generic version of the C - bound , an upper bound over the risk of models expressed as a weighted majority vote that is based on the first and second statistical moments of the vote's margin. On the one hand, this bound may advantageously be applied on more complex outputs than mere binary outputs, such as multiclass labels and multilabel, and on the other hand, it allows us to consider margin relaxations. We provide a specialization of the bound to multiclass classification together with empirical evidence that the presented theoretical result is tightly bound to the risk of the majority vote classifier. We also give insights as to how the proposed bound may be of use to characterize the risk of multilabel predictors.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Jason Weston,et al.  A General Regression Framework for Learning String-to-String Mappings , 2006 .

[3]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[4]  Gökhan BakIr,et al.  A General Regression Framework for Learning String-to-String Mappings , 2007 .

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Florence d'Alché-Buc,et al.  Semi-supervised Penalized Output Kernel Regression for Link Prediction , 2011, ICML.

[7]  Lorenzo Rosasco,et al.  Multiclass Learning with Simplex Coding , 2012, NIPS.

[8]  Robert E. Schapire,et al.  A theory of multiclass boosting , 2010, J. Mach. Learn. Res..

[9]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  Stéphane Ayache,et al.  Majority Vote of Diverse Classifiers for Late Fusion , 2014, S+SSPR.

[12]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Thomas Hofmann,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2007 .

[15]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[16]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[17]  Gökhan BakIr,et al.  Generalization Bounds and Consistency for Structured Labeling , 2007 .

[18]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[19]  Mehryar Mohri,et al.  Multi-Class Deep Boosting , 2014, NIPS.

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[22]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[23]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[24]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[25]  Jeff G. Schneider,et al.  Maximum Margin Output Coding , 2012, ICML.

[26]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[27]  R. Schapire,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[28]  Emilie Morvant,et al.  PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification , 2012, ICML.

[29]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[30]  Mehryar Mohri,et al.  Ensemble Methods for Structured Prediction , 2014, ICML.

[31]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[35]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[36]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[37]  Sebastian Nowozin,et al.  PAC-Bayesian Risk Bounds and Learning Algorithms for the Regression Approach to Structured Output Prediction , 2014 .

[38]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[39]  Qinghua Hu,et al.  Dynamic classifier ensemble using classification confidence , 2013, Neurocomputing.

[40]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[41]  Gilles Blanchard Different Paradigms for Choosing Sequential Reweighting Algorithms , 2004, Neural Computation.

[42]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[43]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[44]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[45]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[46]  Naftali Tishby,et al.  PAC-Bayesian Analysis of Co-clustering and Beyond , 2010, J. Mach. Learn. Res..

[47]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[48]  François Laviolette,et al.  From PAC-Bayes Bounds to Quadratic Programs for Majority Votes , 2011, ICML.