Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.

[1]  Christian Igel,et al.  Resilient Backpropagation (Rprop) for Batch-learning in TensorFlow , 2018, ICLR.

[2]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[3]  François Laviolette,et al.  Risk upper bounds for general ensemble methods with an application to multiclass classification , 2017, Neurocomputing.

[4]  Paul-Marie Samson,et al.  Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Jesse Read,et al.  Kaggle LSHTC4 Winning Solution , 2014, ArXiv.

[8]  Christian Igel,et al.  A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.

[9]  Daniel Berend,et al.  A finite sample analysis of the Naive Bayes classifier , 2015, J. Mach. Learn. Res..

[10]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[11]  Dave E. Eckhardt,et al.  A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors , 1985, IEEE Transactions on Software Engineering.

[12]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[13]  Mu Zhu,et al.  Use of majority votes in statistical learning , 2015 .

[14]  Yevgeny Seldin,et al.  PAC-Bayes-Empirical-Bernstein Inequality , 2013, NIPS.

[15]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[16]  K. Marton A measure concentration inequality for contracting markov chains , 1996 .

[17]  Thomas Hoch,et al.  An Ensemble Learning Approach for the Kaggle Taxi Travel Time Prediction Challenge , 2015, DC@PKDD/ECML.

[18]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[23]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[24]  Christian Igel,et al.  On PAC-Bayesian bounds for random forests , 2019, Machine Learning.

[25]  François Laviolette,et al.  From PAC-Bayes Bounds to Quadratic Programs for Majority Votes , 2011, ICML.

[26]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[27]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[28]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[29]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[30]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[31]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.