Error Correlation and Error Reduction in Ensemble Classifiers

Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining, however, are often affected more by the selection of what is presented to the combiner than by the actual combining method that is chosen. In this paper, we focus on data selection and classifier training methods, in order to 'prepare' classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data are in limited supply.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  M. Singh,et al.  An Evidential Reasoning Approach for Multiple-Attribute Decision Making with Uncertainty , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[5]  Jerome H. Friedman,et al.  An Overview of Predictive Learning and Function Approximation , 1994 .

[6]  E. Reed,et al.  The ecological approach to learning , 1981, Behavioral and Brain Sciences.

[7]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[8]  Johannes R. Sveinsson,et al.  Parallel consensual neural networks , 1997, IEEE Trans. Neural Networks.

[9]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[10]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[12]  Sherif Hashem Bruce Schmeiser Approximating a Function and its Derivatives Using MSE-Optimal Linear Combinations of Trained Feedfo , 1993 .

[13]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[14]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[15]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[16]  Alice E. Smith,et al.  COMMITTEE NETWORKS BY RESAMPLING , 1998 .

[17]  William G. Baxt,et al.  Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks , 1992, Neural Computation.

[18]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[19]  Joydeep Ghosh,et al.  Advances in using hierarchical mixture of experts for signal classification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Olvi L. Mangasarian,et al.  Multisurface method of pattern separation , 1968, IEEE Trans. Inf. Theory.

[21]  Jancik,et al.  Multisurface Method of Pattern Separation , 1993 .

[22]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[23]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[24]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[25]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.

[27]  David S. Touretzky,et al.  Learning with Ensembles: How Over--tting Can Be Useful , 1996 .

[28]  David H. Wolpert,et al.  A Mathematical Theory of Generalization: Part II , 1990, Complex Syst..

[29]  Joydeep Ghosh,et al.  Evidence combination techniques for robust classification of short-duration oceanic signals , 1992, Defense, Security, and Sensing.

[30]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[31]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[33]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[34]  Kagan Tumer,et al.  Order Statistics Combiners for Neural Classifiers 1 , 1995 .

[35]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[36]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[37]  Harry Wechsler,et al.  From Statistics to Neural Networks , 1994, NATO ASI Series.

[38]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[40]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[41]  Jude W. Shavlik,et al.  Interpretation of Artificial Neural Networks: Mapping Knowledge-Based Neural Networks into Rules , 1991, NIPS.

[42]  Kagan Tumer,et al.  Structural adaptation and generalization in supervised feed-forward networks , 1994 .

[43]  Josef Skrzypek,et al.  Synergy of Clustering Multiple Back Propagation Networks , 1989, NIPS.

[44]  Ronny Meir,et al.  Bias, variance and the combination of estimators; The case of linear least squares , 1995 .

[45]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[46]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[47]  Galina L. Rogova,et al.  Combining the results of several neural network classifiers , 1994, Neural Networks.

[48]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[49]  Kagan Tumer,et al.  Theoretical Foundations Of Linear And Order Statistics Combiners For Neural Pattern Classifiers , 1995 .

[50]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[51]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[52]  Jenq-Neng Hwang,et al.  Integration of neural networks and decision tree classifiers for automated cytology screening , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[53]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[54]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[55]  Bhagavatula Vijaya Kumar,et al.  Learning ranks with neural networks , 1995, SPIE Defense + Commercial Sensing.

[56]  Joydeep Ghosh,et al.  Integration Of Neural Classifiers For Passive Sonar Signals , 1996 .