A Unifying Framework for Learning the Linear Combiners for Classifier Ensembles

For classifier ensembles, an effective combination method is to combine the outputs of each classifier using a linearly weighted combination rule. There are multiple ways to linearly combine classifier outputs and it is beneficial to analyze them as a whole. We present a unifying framework for multiple linear combination types in this paper. This unification enables using the same learning algorithms for different types of linear combiners. We present various ways to train the weights using regularized empirical loss minimization. We propose using the hinge loss for better performance as compared to the conventional least-squares loss. We analyze the effects of using hinge loss for various types of linear weight training by running experiments on three different databases. We show that, in certain problems, linear combiners with fewer parameters may perform as well as the ones with much larger number of parameters even in the presence of regularization.

[1]  Naonori Ueda,et al.  Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  A. Sharkey Linear and Order Statistics Combiners for Pattern Classification , 1999 .

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[6]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[10]  Chih-Jen Lin,et al.  A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[11]  Alexander K. Seewald,et al.  How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness , 2002, International Conference on Machine Learning.

[12]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[13]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[14]  Gregory Z. Grudic,et al.  Regularized Linear Models in Stacked Generalization , 2009, MCS.

[15]  Fabio Roli,et al.  Linear Combiners for Classifier Fusion: Some Theoretical and Experimental Results , 2003, Multiple Classifier Systems.

[16]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[17]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[18]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[19]  Yoram Singer,et al.  Efficient Learning using Forward-Backward Splitting , 2009, NIPS.