An Empirical Study of a Linear Regression Combiner on Multi-class Data Sets

The meta-learner MLR (Multi-response Linear Regression) has been proposed as a trainable combiner for fusing heterogeneous base-level classifiers. Although it has interesting properties, it never has been evaluated extensively up to now. This paper employs learning curves to investigate the relative performance of MLR for solving multi-class classification problems in comparison with other trainable combiners. Several strategies (namely, Reusing , Validation and Stacking ) are considered for using the available data to train both the base-level classifiers and the combiner. Experimental results show that due to the limited complexity of MLR , it can outperform the other combiners for small sample sizes when the Validation or Stacking strategy is adopted. Therefore, MLR should be a preferential choice of trainable combiners when solving a multi-class task with small sample size.

[1]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[2]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[3]  David Windridge,et al.  The Neutral Point Method for Kernel-Based Combination of Disjoint Training Data in Multi-modal Pattern Recognition , 2007, MCS.

[4]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[5]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[6]  John Daugman,et al.  How iris recognition works , 2002, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[9]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[10]  Sarunas Raudys Trainable fusion rules. II. Small sample-size effects , 2006, Neural Networks.

[11]  Robert P. W. Duin,et al.  On Deriving the Second-Stage Training Set for Trainable Combiners , 2005, Multiple Classifier Systems.

[12]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[13]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[14]  Hsiu J. Ho,et al.  On fast supervised learning for normal mixture models with missing information , 2006, Pattern Recognit..

[15]  C. Lai,et al.  Supervised classification and spatial dependency analysis in human cancer using high throughput data , 2008 .

[16]  Baozong Yuan,et al.  Does linear combination outperform the k-NN rule? , 2006, 2006 8th international Conference on Signal Processing.

[17]  Sarunas Raudys,et al.  Trainable fusion rules. I. Large sample size case , 2006, Neural Networks.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Ricardo de Córdoba Herralde,et al.  Applying feature reduction analysis to a PPRLM-multiple Gaussian language identification system , 2008 .

[20]  Samy Bengio,et al.  How do correlation and variance of base-experts affect fusion in biometric authentication tasks? , 2005, IEEE Transactions on Signal Processing.

[21]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[22]  Alexander K. Seewald,et al.  How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness , 2002, International Conference on Machine Learning.