GA-stacking: Evolutionary stacked generalization

Stacking is a widely used technique for combining classifiers and improving prediction accuracy. Early research in Stacking showed that selecting the right classifiers, their parameters and the meta-classifiers was a critical issue. Most of the research on this topic hand picks the right combination of classifiers and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good Stacking configurations. Since this can lead to overfitting, one of the goals of this paper is to empirically evaluate the overall efficiency of the approach. A second goal is to compare our approach with the current best Stacking building techniques. The results show that our approach finds Stacking configurations that, in the worst case, perform as well as the best techniques, with the advantage of not having to manually set up the structure of the Stacking system.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Johannes Fürnkranz,et al.  An Evaluation of Grading Classifiers , 2001, IDA.

[3]  Ricardo Aler,et al.  Heuristic Search-Based Stacking of Classifiers , 2002 .

[4]  Hussein A. Abbass,et al.  Data Mining: A Heuristic Approach , 2002 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  David H. Wolpert,et al.  A Mathematical Theory of Generalization: Part II , 1990, Complex Syst..

[7]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[8]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[9]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[10]  Bernard Zenko,et al.  Stacking with Multi-response Model Trees , 2002, Multiple Classifier Systems.

[11]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[12]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[13]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  S. Salzberg,et al.  INSTANCE-BASED LEARNING : Nearest Neighbour with Generalisation , 1995 .

[16]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  T. M. English,et al.  Stacked generalization and simulated evolution. , 1996, Bio Systems.

[21]  Saso Dzeroski,et al.  Combining Multiple Models with Meta Decision Trees , 2000, PKDD.

[22]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[23]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[24]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[25]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[28]  J MerzChristopher Using Correspondence Analysis to Combine Classifiers , 1999 .

[29]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[30]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[31]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[34]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[35]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[36]  Alexander K. Seewald,et al.  How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness , 2002, International Conference on Machine Learning.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[39]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[40]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[41]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[42]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[43]  Bernard Zenko,et al.  Is Combining Classifiers Better than Selecting the Best One , 2002, ICML.

[44]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .