Ensemble-based classifiers

The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. This paper, review existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Chandrika Kamath,et al.  Approximate Splitting for Ensembles of Trees using Histograms , 2001, SDM.

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  A. U.S.,et al.  Measuring the efficiency of decision making units , 2003 .

[5]  Lior Rokach,et al.  Collective-agreement-based pruning of ensembles , 2009, Comput. Stat. Data Anal..

[6]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[7]  Erkki Oja,et al.  Neural and statistical classifiers-taxonomy and two case studies , 1997, IEEE Trans. Neural Networks.

[8]  Lior Rokach,et al.  Improving Supervised Learning by Feature Decomposition , 2002, FoIKS.

[9]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[10]  S. Sohn,et al.  Ensemble Based on Data Envelopment Analysis , 2001 .

[11]  Robbie T. Nakatsu,et al.  Rule‐Based Expert Systems , 2009 .

[12]  Yong Liu,et al.  Generate Different Neural Networks by Negative Correlation Learning , 2005, ICNC.

[13]  William B. Yates,et al.  Use of methodological diversity to improve neural network generalisation , 2005, Neural Computing & Applications.

[14]  C. Brodley Recursive Automatic Bias Selection for Classifier Construction , 2004, Machine Learning.

[15]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[16]  Salvatore J. Stolfo,et al.  Cost Complexity-Based Pruning of Ensemble Classifiers , 2001, Knowledge and Information Systems.

[17]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[18]  William B. Langdon,et al.  Combining Decision Trees and Neural Networks for Drug Discovery , 2002, EuroGP.

[19]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Ran El-Yaniv,et al.  Variance Optimized Bagging , 2002, ECML.

[21]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[22]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[23]  Ivan Bratko,et al.  Feature Transformation by Function Decomposition , 1998, IEEE Intell. Syst..

[24]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[25]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[26]  Alexey Tsymbal,et al.  Ensemble feature selection with the simple Bayesian classification in medical diagnostics , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[27]  Alexey Tsymbal,et al.  Ensemble feature selection with the simple Bayesian classification , 2003, Inf. Fusion.

[28]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[29]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[30]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[31]  Nitesh V. Chawla,et al.  Learning Ensembles from Bites: A Scalable and Accurate Approach , 2004, J. Mach. Learn. Res..

[32]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[33]  Wray L. Buntine,et al.  Graphical models for discovering knowledge , 1996, KDD 1996.

[34]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[35]  Cesare Furlanello,et al.  Parallelizing AdaBoost by weights dynamics , 2007, Comput. Stat. Data Anal..

[36]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[37]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[38]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[39]  Lior Rokach,et al.  Decomposition methodology for classification tasks: a meta decomposer framework , 2006, Pattern Analysis and Applications.

[40]  Lior Rokach,et al.  Genetic algorithm-based feature set partitioning for classification problems , 2008, Pattern Recognit..

[41]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[42]  Seymour Shlien,et al.  Multiple binary decision tree classifiers , 1990, Pattern Recognit..

[43]  Fuad Rahman,et al.  A new hybrid approach in combining multiple experts to recognise handwritten numerals , 1997, Pattern Recognit. Lett..

[44]  Joydeep Ghosh,et al.  Structurally adaptive modular networks for nonstationary environments , 1999, IEEE Trans. Neural Networks.

[45]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[46]  Lior Rokach,et al.  Decomposition Methodology for Knowledge Discovery and Data Mining - Theory and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[47]  Kagan Tumer,et al.  Robust Order Statistics Based Ensembles for Distributed Data Mining , 2001 .

[48]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[49]  Lior Rokach,et al.  Decision-tree instance-space decomposition with grouped gain-ratio , 2007, Inf. Sci..

[50]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[51]  Cullen Schaffer,et al.  Selecting a classification method by cross-validation , 1993, Machine Learning.

[52]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[53]  Huan Liu,et al.  An Empirical Study of Building Compact Ensembles , 2004, WAIM.

[54]  Ludmila I. Kuncheva Diversity in multiple classifier systems , 2005, Inf. Fusion.

[55]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[56]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[57]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Mark A. Musen,et al.  Modular Neural Networks for Medical Prognosis: Quantifying the Benefits of Combining Neural Networks for Survival Prediction , 1997, Connect. Sci..

[59]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[60]  Alan W. Biermann,et al.  Signature Table Systems and Learning , 1982, IEEE Transactions on Systems, Man, and Cybernetics.

[61]  Laurent Mascarilla,et al.  Reject Strategies Driven Combination of Pattern Classifiers , 2002, Pattern Analysis & Applications.

[62]  William B. Yates,et al.  Engineering Multiversion Neural-Net Systems , 1996, Neural Computation.

[63]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[64]  M. Field,et al.  Robust Order Statistics based Ensembles for Distributed Data Mining , 2000 .

[65]  Donald Michie,et al.  Problem Decomposition and the Learning of Skills , 1995, ECML.

[66]  Lior Rokach,et al.  Selective Voting - Getting More for Less in Sensor Fusion , 2006, Int. J. Pattern Recognit. Artif. Intell..

[67]  Ashok N. Srivastava,et al.  Nonlinear gated experts for time series: discovering regimes and avoiding overfitting , 1995, Int. J. Neural Syst..

[68]  Cullen Schaffer,et al.  Technical Note: Selecting a Classification Method by Cross-Validation , 1993, Machine Learning.

[69]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[70]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[71]  Gavin Brown,et al.  Negative Correlation Learning and the Ambiguity Family of Ensemble Methods , 2003, Multiple Classifier Systems.

[72]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[73]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[74]  Qinghua Hu,et al.  EROS: Ensemble rough subspaces , 2007, Pattern Recognit..

[75]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[76]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[77]  Andrew Kusiak,et al.  Decomposition in data mining: an industrial case study , 2000 .

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  F. Provost A Survey of Methods for Scaling Up Inductive Learning Algorithms , 1997 .

[80]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[82]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[83]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[84]  T. Saaty,et al.  The Analytic Hierarchy Process , 1985 .

[85]  Ke Chen,et al.  Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification , 1997, Int. J. Pattern Recognit. Artif. Intell..

[86]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[87]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[88]  Lior Rokach,et al.  Classifier evaluation under limited resources , 2006, Pattern Recognit. Lett..

[89]  João Gama A Linear-Bayes Classifier , 2000, IBERAMIA-SBIA.

[90]  Paolo Frasconi,et al.  New results on error correcting output codes of kernel machines , 2004, IEEE Transactions on Neural Networks.

[91]  T. Johansen,et al.  A NARMAX model representation for adaptive control based on local models , 1992 .

[92]  Wray L. Buntine,et al.  A theory of learning classification rules , 1990 .

[93]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[94]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[95]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[96]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[97]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[98]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[99]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[100]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[101]  B.V. Dasarathy,et al.  A composite classifier system design: Concepts and methodology , 1979, Proceedings of the IEEE.

[102]  KohaviRon,et al.  An Empirical Comparison of Voting Classification Algorithms , 1999 .

[103]  Chun-Xia Zhang,et al.  A local boosting algorithm for solving classification problems , 2008, Comput. Stat. Data Anal..

[104]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[105]  Christian Lang,et al.  Bi-decomposition of function sets using multi-valued logic , 2003, Ausgezeichnete Informatikdissertationen.

[106]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[107]  Kagan Tumer,et al.  Input decimated ensembles , 2003, Pattern Analysis & Applications.

[108]  Kurt Hornik,et al.  A Cluster Ensembles Framework , 2003, HIS.

[109]  Paul W. Munro,et al.  Improving Committee Diagnosis with Resampling Techniques , 1995, NIPS.

[110]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[111]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[112]  Derek Partridge,et al.  Diversity between Neural Networks and Decision Trees for Building Multiple Classifier Systems , 2000, Multiple Classifier Systems.

[113]  Lior Rokach,et al.  Space Decomposition in Data Mining: A Clustering Approach , 2002, ISMIS.

[114]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[115]  Lior Rokach,et al.  Feature set decomposition for decision trees , 2005, Intell. Data Anal..

[116]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[117]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[118]  Salvatore J. Stolfo,et al.  On the Accuracy of Meta-learning for Scalable Data Mining , 2004, Journal of Intelligent Information Systems.

[119]  Xiaohua Hu,et al.  Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[120]  Robert E. Jenkins,et al.  A simplified neural network solution through problem decomposition: the case of the truck backer-upper , 1993, IEEE Trans. Neural Networks.

[121]  Alexander H. Waibel,et al.  The Meta-Pi Network: Building Distributed Knowledge Representations for Robust Multisource Pattern Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[122]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[123]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[124]  Wen Tan,et al.  Fast Learning Algorithm for Controlling Logistic Chaotic System Based on Chebyshev Neural Network , 2009, 2009 Fifth International Conference on Natural Computation.

[125]  Terry Windeatt,et al.  An Empirical Comparison of Pruning Methods for Ensemble Classifiers , 2001, IDA.

[126]  Lior Rokach,et al.  Improving Supervised Learning by Sample Decomposition , 2005, Int. J. Comput. Intell. Appl..

[127]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[128]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.