论文信息 - Learning a priori constrained weighted majority votes - 字舞流文

Learning a priori constrained weighted majority votes

Weighted majority votes allow one to combine the output of several classifiers or voters. MinCq is a recent algorithm for optimizing the weight of each voter based on the minimization of a theoretical bound over the risk of the vote with elegant PAC-Bayesian generalization guarantees. However, while it has demonstrated good performance when combining weak classifiers, MinCq cannot make use of the useful a priori knowledge that one may have when using a mixture of weak and strong voters. In this paper, we propose P-MinCq, an extension of MinCq that can incorporate such knowledge in the form of a constraint over the distribution of the weights, along with general proofs of convergence that stand in the sample compression setting for data-dependent voters. The approach is applied to a vote of $$k$$k-NN classifiers with a specific modeling of the voters’ performance. P-MinCq significantly outperforms the classic $$k$$k-NN classifier, a symmetric NN and MinCq using the same voters. We show that it is also competitive with LMNN, a popular metric learning algorithm, and that combining both approaches further reduces the error.

Marc Sebban | Amaury Habrard | Emilie Morvant | Aurélien Bellet | Amaury Habrard | M. Sebban | Emilie Morvant | A. Bellet

[1] Ashok N. Srivastava,et al. Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[2] Giorgio Valentini,et al. Ensemble methods : a review , 2012 .

[3] Emilie Morvant,et al. PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification , 2012, ICML.

[4] John Shawe-Taylor,et al. PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[5] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[6] 中澤真,et al. Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[7] Gouri Deshpande,et al. Analysis of the survey , 2002 .

[8] François Laviolette,et al. PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers , 2007, J. Mach. Learn. Res..

[9] John Shawe-Taylor,et al. Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[10] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[11] François Laviolette,et al. A PAC-Bayes Sample-compression Approach to Kernel Methods , 2011, ICML.

[12] Michael Kearns,et al. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[13] Richard Nock,et al. A Simple Locally Adaptive Nearest Neighbor Rule With Application To Pollution Forecasting , 2003, Int. J. Pattern Recognit. Artif. Intell..

[14] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[15] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.

[16] François Laviolette,et al. PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[17] Andreas Maurer,et al. A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[18] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[19] Dacheng Tao,et al. A Survey on Multi-view Learning , 2013, ArXiv.

[20] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21] Stephen Tyree,et al. Non-linear Metric Learning , 2012, NIPS.

[22] Mario Marchand,et al. Learning with Decision Lists of Data-Dependent Features , 2005, J. Mach. Learn. Res..

[23] Stéphane Ayache,et al. Majority Vote of Diverse Classifiers for Late Fusion , 2014, S+SSPR.

[24] Ronald L. Rivest,et al. Learning decision lists , 2004, Machine Learning.

[25] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[26] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[27] Manfred K. Warmuth,et al. Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[28] David Haussler,et al. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[29] Peter Auer,et al. Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[30] François Laviolette,et al. From PAC-Bayes Bounds to Quadratic Programs for Majority Votes , 2011, ICML.

[31] Pedro M. Domingos. Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[32] David G. Stork,et al. Pattern Classification , 1973 .

[33] Shiliang Sun,et al. A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[34] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[35] Mohan S. Kankanhalli,et al. Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.