Systematic Ensemble Learning for Regression

The motivation of this work is to improve the performance of standard stacking approaches or ensembles, which are composed of simple, heterogeneous base models, through the integration of the generation and selection stages for regression problems. We propose two extensions to the standard stacking approach. In the first extension we combine a set of standard stacking approaches into an ensemble of ensembles using a two-step ensemble learning in the regression setting. The second extension consists of two parts. In the initial part a diversity mechanism is injected into the original training data set, systematically generating different training subsets or partitions, and corresponding ensembles of ensembles. In the final part after measuring the quality of the different partitions or ensembles, a max-min rule-based selection algorithm is used to select the most appropriate ensemble/partition on which to make the final prediction. We show, based on experiments over a broad range of data sets, that the second extension performs better than the best of the standard stacking approaches, and is as good as the oracle of databases, which has the best base model selected by cross-validation for each data set. In addition to that, the second extension performs better than two state-of-the-art ensemble methods for regression, and it is as good as a third state-of-the-art ensemble method.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Yang Yu,et al.  Cocktail Ensemble for Regression , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[6]  David J. Hand,et al.  Intelligent Data Analysis: An Introduction , 2005 .

[7]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[8]  Niall Rooney,et al.  A weighted combination of stacking and dynamic integration , 2007, Pattern Recognit..

[9]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[13]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[14]  R. Polikar,et al.  Bootstrap - Inspired Techniques in Computation Intelligence , 2007, IEEE Signal Processing Magazine.

[15]  Chris D. Nugent,et al.  Pruning extensions to stacking , 2006, Intell. Data Anal..

[16]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[21]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[22]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[25]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[26]  Peter J. Bentley,et al.  Finding Acceptable Solutions in the Pareto-Optimal Range using Multiobjective Genetic Algorithms , 1998 .

[27]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[28]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[29]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[30]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[31]  James E. Gentle,et al.  Elements of computational statistics , 2002 .

[32]  Kin Keung Lai,et al.  Credit Risk Analysis Using a Reliability-Based Neural Network Ensemble Model , 2006, ICANN.

[33]  Chris D. Nugent,et al.  Non-strict heterogeneous Stacking , 2007, Pattern Recognit. Lett..

[34]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[35]  Stefan W. Christensen Ensemble Construction via Designed Output Distortion , 2003, Multiple Classifier Systems.

[36]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[37]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[38]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[39]  Kin Keung Lai,et al.  A Bias-Variance-Complexity Trade-Off Framework for Complex System Modeling , 2006, ICCSA.

[40]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[41]  A. Krogh,et al.  Statistical mechanics of ensemble learning , 1997 .

[42]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[43]  Alexey Tsymbal,et al.  Dynamic Integration of Regression Models , 2004, Multiple Classifier Systems.

[44]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[45]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..