Ensemble based systems in decision making

In matters of great importance that have financial, medical, social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes many more. In doing so, we weigh the individual opinions, and combine them through some thought process to reach a final decision that is presumably the most informed one. The process of consulting "several experts" before making a final decision is perhaps second nature to us; yet, the extensive benefits of such a process in automated decision making applications have only recently been discovered by computational intelligence community. Also known under various other names, such as multiple classifier systems, committee of classifiers, or mixture of experts, ensemble based systems have shown to produce favorable results compared to those of single-expert systems for a broad range of applications and under a variety of scenarios. Design, implementation and application of such systems are the main topics of this article. Specifically, this paper reviews conditions under which ensemble based systems may be more beneficial than their single classifier counterparts, algorithms for generating individual components of the ensemble systems, and various procedures through which the individual classifiers can be combined. We discuss popular ensemble based algorithms, such as bagging, boosting, AdaBoost, stacked generalization, and hierarchical mixture of experts; as well as commonly used combination rules, including algebraic combination of outputs, voting based techniques, behavior knowledge space, and decision templates. Finally, we look at current and future research directions for novel applications of ensemble systems. Such applications include incremental learning, data fusion, feature selection, learning with missing features, confidence estimation, and error correcting output codes; all areas in which ensemble systems have shown great promise

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[3]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[4]  Ludmila I. Kuncheva,et al.  Using measures of similarity and inclusion for multiple classifier fusion by decision templates , 2001, Fuzzy Sets Syst..

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[7]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[8]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[9]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[10]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Michel Grabisch,et al.  Classification by fuzzy integral: performance and tests , 1994, CVPR 1994.

[12]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[13]  Jun Wang,et al.  Incremental learning with balanced update on receptive fields for multi-sensor data fusion , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[14]  Terry Windeatt,et al.  Decoding Rules for Error Correcting Output Code Ensembles , 2005, Multiple Classifier Systems.

[15]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[16]  Frank J. Smieja,et al.  The pandemonium system of reflective agents , 1996, IEEE Trans. Neural Networks.

[17]  Robi Polikar,et al.  Ensemble Confidence Estimates Posterior Probability , 2005, Multiple Classifier Systems.

[18]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[19]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[20]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[22]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  William B. Langdon,et al.  Data Fusion by Intelligent Classifier Combination , 2001 .

[24]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[25]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[26]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[28]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[29]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[31]  Sung-Bae Cho,et al.  Multiple network fusion using fuzzy logic , 1995, IEEE Trans. Neural Networks.

[32]  R. French,et al.  Catastrophic Forgetting in Connectionist Networks: Causes, Consequences and Solutions , 1994 .

[33]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[34]  Johannes R. Sveinsson,et al.  Use of multiple classifiers in classification of data from multiple data sources , 2001, IGARSS 2001. Scanning the Present and Resolving the Future. Proceedings. IEEE 2001 International Geoscience and Remote Sensing Symposium (Cat. No.01CH37217).

[35]  Nikunj C. Oza Boosting with Averaged Weight Vectors , 2003, Multiple Classifier Systems.

[36]  P. Boland Majority Systems and the Condorcet Jury Theorem , 1989 .

[37]  R. Polikar,et al.  Incremental learning from unbalanced data , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[38]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[39]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[40]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[41]  Ludmila I. Kuncheva,et al.  'Change-glasses' approach in pattern recognition , 1993, Pattern Recognit. Lett..

[42]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[43]  Ludmila I. Kuncheva,et al.  Feature Subsets for Classifier Combination: An Enumerative Experiment , 2001, Multiple Classifier Systems.

[44]  Robi Polikar,et al.  An ensemble of classifiers approach for the missing feature problem , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[45]  Robi Polikar,et al.  Learn++.MT: A New Approach to Incremental Learning , 2004, Multiple Classifier Systems.

[46]  Joydeep Ghosh,et al.  Multiclassifier Systems: Back to the Future , 2002, Multiple Classifier Systems.

[47]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[48]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[49]  Ching Y. Suen,et al.  Optimal combinations of pattern classifiers , 1995, Pattern Recognit. Lett..

[50]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[51]  Philip M. Long,et al.  Boosting and Microarray Data , 2003, Machine Learning.

[52]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[53]  Yi Lu,et al.  Knowledge integration in a multiple classifier system , 2004, Applied Intelligence.

[54]  Gail A. Carpenter,et al.  Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks , 2005, Neural Networks.

[55]  R. Polikar,et al.  An incremental learning algorithm with confidence estimation for automated identification of NDE signals , 2004, IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control.

[56]  Ching Y. Suen,et al.  The behavior-knowledge space method for combination of multiple classifiers , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Isabelle Bloch Information combination operators for data fusion: a comparative review with classification , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[58]  Nageswara S. V. Rao,et al.  On Fusers that Perform Better than Best Sensor , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Thierry Denoeux,et al.  A neural network classifier based on Dempster-Shafer theory , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[60]  Josef Kittler,et al.  Experimental evaluation of expert fusion strategies , 1999, Pattern Recognit. Lett..

[61]  Horst Bunke,et al.  An evaluation of ensemble methods in handwritten word recognition based on feature selection , 2004, ICPR 2004.

[62]  Mübeccel Demirekler,et al.  Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence , 2003, Speech Commun..

[63]  Kagan Tumer,et al.  Input Decimation Ensembles: Decorrelation through Dimensionality Reduction , 2001, Multiple Classifier Systems.

[64]  Nitesh V. Chawla,et al.  Distributed Pasting of Small Votes , 2002, Multiple Classifier Systems.

[65]  Galina L. Rogova,et al.  Combining the results of several neural network classifiers , 1994, Neural Networks.

[66]  Sung-Bae Cho,et al.  Combining multiple neural networks by fuzzy integral for robust classification , 1995, IEEE Trans. Syst. Man Cybern..

[67]  Hidetomo Ichihashi,et al.  Neuro-fuzzy ID3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning , 1996, Fuzzy Sets Syst..

[68]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[69]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[70]  B.V. Dasarathy,et al.  A composite classifier system design: Concepts and methodology , 1979, Proceedings of the IEEE.

[71]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[73]  L. Shapley,et al.  Optimizing group judgmental accuracy in the presence of interdependencies , 1984 .

[74]  Joydeep Ghosh,et al.  An Empirical Comparison of Hierarchical vs. Two-Level Approaches to Multiclass Problems , 2004, Multiple Classifier Systems.

[75]  Yaxin Bi,et al.  Combining Multiple Classifiers Using Dempster's Rule of Combination for Text Categorization , 2004, MDAI.

[76]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[77]  Mohamed S. Kamel,et al.  Cluster-Based Cumulative Ensembles , 2005, Multiple Classifier Systems.

[78]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[79]  Michael I. Jordan,et al.  Local linear perceptrons for classification , 1996, IEEE Trans. Neural Networks.

[80]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[81]  L. Breiman Arcing Classifiers , 1998 .

[82]  Nikunj C. Oza,et al.  AveBoost2: Boosting for Noisy Data , 2004, Multiple Classifier Systems.

[83]  Josef Kittler,et al.  Information Analysis of Multiple Classifier Fusion , 2001, Multiple Classifier Systems.

[84]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[85]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[86]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .

[87]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[88]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[89]  Fabio Roli,et al.  An approach to the automatic design of multiple classifier systems , 2001, Pattern Recognit. Lett..

[90]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[91]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[92]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  Weiguo Fan,et al.  On linear mixture of expert approaches to information retrieval , 2006, Decis. Support Syst..

[94]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[96]  Robi Polikar,et al.  Combining classifiers for multisensor data fusion , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[97]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.