Learning a priori constrained weighted majority votes

Weighted majority votes allow one to combine the output of several classifiers or voters. MinCq is a recent algorithm for optimizing the weight of each voter based on the minimization of a theoretical bound over the risk of the vote with elegant PAC-Bayesian generalization guarantees. However, while it has demonstrated good performance when combining weak classifiers, MinCq cannot make use of the useful a priori knowledge that one may have when using a mixture of weak and strong voters. In this paper, we propose P-MinCq, an extension of MinCq that can incorporate such knowledge in the form of a constraint over the distribution of the weights, along with general proofs of convergence that stand in the sample compression setting for data-dependent voters. The approach is applied to a vote of $$k$$k-NN classifiers with a specific modeling of the voters’ performance. P-MinCq significantly outperforms the classic $$k$$k-NN classifier, a symmetric NN and MinCq using the same voters. We show that it is also competitive with LMNN, a popular metric learning algorithm, and that combining both approaches further reduces the error.

[1]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[2]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[3]  Emilie Morvant,et al.  PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification , 2012, ICML.

[4]  John Shawe-Taylor,et al.  PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[5]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[6]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[7]  Gouri Deshpande,et al.  Analysis of the survey , 2002 .

[8]  François Laviolette,et al.  PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers , 2007, J. Mach. Learn. Res..

[9]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  François Laviolette,et al.  A PAC-Bayes Sample-compression Approach to Kernel Methods , 2011, ICML.

[12]  Michael Kearns,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[13]  Richard Nock,et al.  A Simple Locally Adaptive Nearest Neighbor Rule With Application To Pollution Forecasting , 2003, Int. J. Pattern Recognit. Artif. Intell..

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[16]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[17]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[18]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[19]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[20]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[22]  Mario Marchand,et al.  Learning with Decision Lists of Data-Dependent Features , 2005, J. Mach. Learn. Res..

[23]  Stéphane Ayache,et al.  Majority Vote of Diverse Classifiers for Late Fusion , 2014, S+SSPR.

[24]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[25]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[26]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[27]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[28]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[29]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[30]  François Laviolette,et al.  From PAC-Bayes Bounds to Quadratic Programs for Majority Votes , 2011, ICML.

[31]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[32]  David G. Stork,et al.  Pattern Classification , 1973 .

[33]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.