Empirical evaluation of optimized stacking configurations

Stacking is one of the most used techniques for combining classifiers and improves prediction accuracy. Early research in stacking showed that selecting the right classifiers, their parameters and the metaclassifiers was the main bottleneck for its use. Most of the research on this topic selects by hand the right combination of classifiers and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good stacking configurations. Since this can lead to overfitting, one of the goals of This work is to evaluate empirically the overall efficiency of the approach. A second goal is to compare our approach with current best stacking building techniques. The results show that our approach finds stacking configurations that, in the worst case, perform as well as the best techniques, with the advantage of not having to set up manually the structure of the stacking system.

[1]  Ricardo Aler,et al.  Heuristic Search-Based Stacking of Classifiers , 2002 .

[2]  S. Salzberg,et al.  INSTANCE-BASED LEARNING : Nearest Neighbour with Generalisation , 1995 .

[3]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[4]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[5]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[6]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[9]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[10]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Johannes Fürnkranz,et al.  An Evaluation of Grading Classifiers , 2001, IDA.

[13]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[14]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[15]  Bernard Zenko,et al.  Is Combining Classifiers Better than Selecting the Best One , 2002, ICML.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  Bernard Zenko,et al.  Stacking with Multi-response Model Trees , 2002, Multiple Classifier Systems.

[19]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[20]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[21]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[22]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[23]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[25]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[26]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[27]  Alexander K. Seewald,et al.  How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness , 2002, International Conference on Machine Learning.

[28]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[29]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[30]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[31]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[32]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[33]  Saso Dzeroski,et al.  Combining Multiple Models with Meta Decision Trees , 2000, PKDD.

[34]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.