Exploration of classification confidence in ensemble learning

Ensemble learning has attracted considerable attention owing to its good generalization performance. The main issues in constructing a powerful ensemble include training a set of diverse and accurate base classifiers, and effectively combining them. Ensemble margin, computed as the difference of the vote numbers received by the correct class and the another class received with the most votes, is widely used to explain the success of ensemble learning. This definition of the ensemble margin does not consider the classification confidence of base classifiers. In this work, we explore the influence of the classification confidence of the base classifiers in ensemble learning and obtain some interesting conclusions. First, we extend the definition of ensemble margin based on the classification confidence of the base classifiers. Then, an optimization objective is designed to compute the weights of the base classifiers by minimizing the margin induced classification loss. Several strategies are tried to utilize the classification confidences and the weights. It is observed that weighted voting based on classification confidence is better than simple voting if all the base classifiers are used. In addition, ensemble pruning can further improve the performance of a weighted voting ensemble. We also compare the proposed fusion technique with some classical algorithms. The experimental results also show the effectiveness of weighted voting with classification confidence.

[1]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[2]  K. Johana,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2022 .

[3]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[4]  Daniel Hernández-Lobato,et al.  Pruning Adaptive Boosting Ensembles by Means of a Genetic Algorithm , 2006, IDEAL.

[5]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[6]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[7]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[8]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Grigorios Tsoumakas,et al.  An ensemble uncertainty aware measure for directed hill climbing ensemble pruning , 2010, Machine Learning.

[10]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Alberto Suárez,et al.  Aggregation Ordering in Bagging , 2004 .

[12]  John Shawe-Taylor,et al.  Boosting the Margin Distribution , 2000, IDEAL.

[13]  Nello Cristianini,et al.  Margin Distribution Bounds on Generalization , 1999, EuroCOLT.

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[16]  Li Zhang,et al.  Sparse ensembles using weighted combination methods based on linear programming , 2011, Pattern Recognit..

[17]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..

[18]  Lefteris Angelis,et al.  Selective fusion of heterogeneous classifiers , 2005, Intell. Data Anal..

[19]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Xindong Wu,et al.  Ensemble pruning via individual contribution ordering , 2010, KDD.

[21]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[22]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[23]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[24]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[25]  Grigorios Tsoumakas,et al.  An Ensemble Pruning Primer , 2009, Applications of Supervised and Unsupervised Ensemble Methods.

[26]  Zhi-Hua Zhou,et al.  On the Margin Explanation of Boosting Algorithms , 2008, COLT.

[27]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[28]  Zhi-Hua Zhou,et al.  Ensembling local learners ThroughMultimodal perturbation , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  N. Cristianini,et al.  Robust Bounds on Generalization from the Margin Distribution , 1998 .

[30]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  BaesensBart,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2004 .

[32]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[33]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[34]  Chunhua Shen,et al.  Boosting Through Optimization of Margin Distributions , 2009, IEEE Transactions on Neural Networks.

[35]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[36]  Tom Heskes,et al.  Clustering ensembles of neural network models , 2003, Neural Networks.

[37]  Johannes R. Sveinsson,et al.  Parallel consensual neural networks , 1997, IEEE Trans. Neural Networks.

[38]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[39]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[40]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[41]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[42]  ZhouZhi-Hua,et al.  Ensembling neural networks , 2002 .

[43]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[44]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[45]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[46]  Qinghua Hu,et al.  Margin distribution based bagging pruning , 2012, Neurocomputing.

[47]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[48]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[49]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[50]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[51]  Wei Fan,et al.  Bagging , 2009, Encyclopedia of Machine Learning.

[52]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[54]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[55]  Yang Yu,et al.  Ensembling local learners ThroughMultimodal perturbation , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[56]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[57]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[59]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[60]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[61]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[62]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[63]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.