Ensemble approaches for regression: A survey

The goal of ensemble regression is to combine several models in order to improve the prediction accuracy in learning problems with a numerical target variable. The process of ensemble learning can be divided into three phases: the generation phase, the pruning phase, and the integration phase. We discuss different approaches to each of these phases that are able to deal with the regression problem, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields. Furthermore, this work makes it possible to identify interesting areas for future research.

[1]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[2]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[3]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[4]  Nathan Intrator,et al.  Boosting Regression Estimators , 1999, Neural Computation.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[6]  Peter Bühlmann,et al.  Bagging, Boosting and Ensemble Methods , 2012 .

[7]  Jörg D. Wichard,et al.  Building Ensembles with Heterogeneous Models , 2003 .

[8]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[9]  Gian Luca Marcialis,et al.  A study on the performances of dynamic classifier selection based on local accuracy estimation , 2005, Pattern Recognit..

[10]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[11]  Randall Matignon Data Mining Using SAS® Enterprise Miner™: Matignon/Data Mining , 2007 .

[12]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[13]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[14]  Bernhard Pfahringer,et al.  Improving on Bagging with Input Smearing , 2006, PAKDD.

[15]  Gonzalo Mart,et al.  Pruning in Ordered Regression Bagging Ensembles , 2006 .

[16]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[17]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.

[18]  Alípio Mário Jorge,et al.  An Experiment with Association Rules and Classification: Post-Bagging and Conviction , 2005, Discovery Science.

[19]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[20]  Mykola Pechenizkiy,et al.  Dynamic Integration with Random Forests , 2006, ECML.

[21]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[22]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[23]  Alexey Tsymbal,et al.  A Dynamic Integration Algorithm for an Ensemble of Classifiers , 1999, ISMIS.

[24]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[25]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[26]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[27]  J. Friedman Stochastic gradient boosting , 2002 .

[28]  Daniel Hernández-Lobato,et al.  Pruning in Ordered Regression Bagging Ensembles , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[29]  Ivor W. Tsang,et al.  Diversified SVM Ensembles for Large Data Sets , 2006, ECML.

[30]  João Mendes Moreira,et al.  An ensemble regression approach for bus trip time prediction , 2006 .

[31]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[32]  Yunping Zou,et al.  Embedded neural network to model-based Permanent Magnet Synchronous Motor diagnostics , 2009, 2009 IEEE 6th International Power Electronics and Motion Control Conference.

[33]  Kenneth DeJong,et al.  Robust feature selection algorithms , 1993, Proceedings of 1993 IEEE Conference on Tools with Al (TAI-93).

[34]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[35]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[36]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[38]  Durga L. Shrestha,et al.  Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression , 2006, Neural Computation.

[39]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[40]  Xiaoyu Chu,et al.  Predicting changes in protein thermostability brought about by single- or multi-site mutations , 2010, BMC Bioinformatics.

[41]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[42]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[43]  Leo Breiman,et al.  Using Iterated Bagging to Debias Regressions , 2001, Machine Learning.

[44]  Ian Witten,et al.  Data Mining , 2000 .

[45]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[46]  Nathan Intrator,et al.  Bootstrapping with Noise: An Effective Regularization Technique , 1996, Connect. Sci..

[47]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[49]  Alexey Tsymbal,et al.  Dynamic Integration of Regression Models , 2004, Multiple Classifier Systems.

[50]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[51]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[52]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[53]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[54]  Fernando José Von Zuben,et al.  Adaptive Radius Immune Algorithm for Data Clustering , 2005, ICARIS.

[55]  Nanning Zheng,et al.  Skew Estimation of Document Images Using Bagging , 2010, IEEE Transactions on Image Processing.

[56]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[57]  Manfred M. Fischer,et al.  Neural network ensembles and their application to traffic flow prediction in telecommunications networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[58]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[59]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[60]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[61]  Toniann Pitassi,et al.  A Gradient-Based Boosting Algorithm for Regression Problems , 2000, NIPS.

[62]  Ahmed Al-Ani,et al.  Feature Subset Selection Using Ant Colony Optimization , 2008 .

[63]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[64]  Matti Aksela,et al.  Comparison of Classifier Selection Methods for Improving Committee Performance , 2003, Multiple Classifier Systems.

[65]  John Loughrey,et al.  Using Early Stopping to Reduce Overfitting in Wrapper-Based Feature Weighting , 2005 .

[66]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[67]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[68]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[69]  João Pedro Carvalho Leal Mendes Moreira,et al.  Travel time prediction for the planning of mass transit companies: a machine learning approach , 2008 .

[70]  Ivor W. Tsang,et al.  Core Vector Regression for very large regression problems , 2005, ICML.

[71]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[72]  Randall Matignon,et al.  Data Mining Using SAS Enterprise Miner , 2007 .

[73]  Chun-Xia Zhang,et al.  An empirical study of using Rotation Forest to improve regressors , 2008, Appl. Math. Comput..

[74]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[75]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[76]  Michael J. Pazzani,et al.  Classification and regression by combining models , 1998 .

[77]  William B. Yates,et al.  Engineering Multiversion Neural-Net Systems , 1996, Neural Computation.

[78]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[79]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[81]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[82]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Alípio Mário Jorge,et al.  Ensembles of jittered association rule classifiers , 2010, Data Mining and Knowledge Discovery.

[84]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[85]  Qiang-Li Zhao,et al.  A Fast Ensemble Pruning Algorithm Based on Pattern Mining Process , 2009, ECML/PKDD.

[86]  Ling Li,et al.  Infinite Ensemble Learning with Support Vector Machines , 2005, ECML.

[87]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[88]  Elizabeth Shriberg,et al.  An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems , 2009, J. Mach. Learn. Res..

[89]  Alípio Mário Jorge,et al.  Comparing state-of-the-art regression methods for long term travel time prediction , 2012, Intell. Data Anal..

[90]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[91]  Pablo M. Granitto,et al.  Neural network ensembles: evaluation of aggregation algorithms , 2005, Artif. Intell..

[92]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[93]  Eamonn J. Keogh,et al.  Ensembles of Nearest Neighbor Forecasts , 2006, ECML.

[94]  Yang Yu,et al.  Cocktail Ensemble for Regression , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[95]  Paul W. Munro,et al.  Reducing Variance of Committee Prediction with Resampling Techniques , 1996, Connect. Sci..

[96]  Hendrik Blockeel,et al.  Experiment Databases , 2007, Inductive Databases and Constraint-Based Data Mining.

[97]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[98]  Elaine J. Weyuker,et al.  Comparing the effectiveness of several modeling methods for fault prediction , 2010, Empirical Software Engineering.

[99]  Guoyin Wang,et al.  Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing , 2013, Lecture Notes in Computer Science.

[100]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[101]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[102]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[103]  Agostino Di Ciaccio,et al.  Improving nonparametric regression methods by bagging and boosting , 2002 .

[104]  Carlos Soares,et al.  Ensemble Learning: A Study on Different Variants of the Dynamic Selection Approach , 2009, MLDM.

[105]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[106]  P. Stark Bounded-Variable Least-Squares: an Algorithm and Applications , 2008 .

[107]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[108]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[109]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[110]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[111]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[112]  Randall Matignon Data Mining Using SAS Enterprise Miner (Wiley Series in Computational Statistics) , 2007 .

[113]  Geoffrey I. Webb,et al.  Multistrategy ensemble learning: reducing error by combining ensemble learning techniques , 2004, IEEE Transactions on Knowledge and Data Engineering.

[114]  N. Garc'ia-Pedrajas,et al.  CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features , 2005, J. Artif. Intell. Res..

[115]  Lior Rokach,et al.  Random Projection Ensemble Classifiers , 2009, ICEIS.

[116]  Alípio Mário Jorge,et al.  Iterative Reordering of Rules for Building Ensembles Without Relearning , 2007, Discovery Science.

[117]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[118]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.

[119]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[120]  Zbigniew Telec,et al.  A Multi-agent System to Assist with Real Estate Appraisals Using Bagging Ensembles , 2009, ICCCI.

[121]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[122]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[123]  Antanas Verikas,et al.  Soft combination of neural classifiers: A comparative study , 1999, Pattern Recognit. Lett..

[124]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[125]  César Hervás-Martínez,et al.  Cooperative coevolution of artificial neural network ensembles for pattern classification , 2005, IEEE Transactions on Evolutionary Computation.

[126]  주철환 H.O.T , 1999 .

[127]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[128]  Lior Rokach,et al.  Pattern Classification Using Ensemble Methods , 2009, Series in Machine Perception and Artificial Intelligence.

[129]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[130]  Tom Heskes,et al.  Clustering ensembles of neural network models , 2003, Neural Networks.

[131]  Amnon Meisels,et al.  Ensemble methods for improving the performance of neighborhood-based collaborative filtering , 2009, RecSys '09.

[132]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[133]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[134]  A. Buja,et al.  OBSERVATIONS ON BAGGING , 2006 .

[135]  Luca Didaci,et al.  Dynamic Classifier Selection by Adaptive k-Nearest-Neighbourhood Rule , 2004, Multiple Classifier Systems.

[136]  Niall Rooney,et al.  A weighted combination of stacking and dynamic integration , 2007, Pattern Recognit..

[137]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[138]  Vasile Palade,et al.  Multi-Classifier Systems: Review and a roadmap for developers , 2006, Int. J. Hybrid Intell. Syst..

[139]  RanawanaRomesh,et al.  Multi-Classifier Systems: Review and a roadmap for developers , 2006 .

[140]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[141]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[142]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[143]  Naveen Aggarwal,et al.  Content Management System Effort Estimation Using Bagging Predictors , 2008, EIAT/IETA.

[144]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[145]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[146]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[147]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[148]  Carlotta Domeniconi,et al.  Nearest neighbor ensemble , 2004, ICPR 2004.

[149]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[150]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[151]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[152]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[153]  Fernando José Von Zuben,et al.  The Influence of the Pool of Candidates on the Performance of Selection and Combination Techniques in Ensembles , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[154]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[155]  HoTin Kam The Random Subspace Method for Constructing Decision Forests , 1998 .

[156]  Zoran Obradovic,et al.  Effective pruning of neural network classifier ensembles , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[157]  Xing Wu,et al.  Research on ensemble learning based on discretization method , 2008, 2008 9th International Conference on Signal Processing.

[158]  Lior Rokach,et al.  Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography , 2009, Comput. Stat. Data Anal..

[159]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[160]  Lior Rokach,et al.  Collective-agreement-based pruning of ensembles , 2009, Comput. Stat. Data Anal..

[161]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .

[162]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[163]  Bogdan Gabrys,et al.  Application of the Evolutionary Algorithms for Classifier Selection in Multiple Classifier Systems with Majority Voting , 2001, Multiple Classifier Systems.

[164]  Fabio Roli,et al.  Adaptive Selection of Image Classifiers , 1997, ICIAP.

[165]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..