Input decimated ensembles

Abstract Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers’ performance levels high is an important area of research. In this article, we explore Input Decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses these subsets to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.

[1]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[2]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[3]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[4]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  C. Sitthi-amorn,et al.  Bias , 1993, The Lancet.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[10]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[11]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[12]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[13]  Joydeep Ghosh,et al.  A neural network based hybrid system for detection, characterization, and classification of short-duration oceanic signals , 1992 .

[14]  N Ramanujam,et al.  Development of a multivariate statistical algorithm to analyze human cervical tissue fluorescence spectra acquired in vivo , 1996, Lasers in surgery and medicine.

[15]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[16]  Joydeep Ghosh,et al.  Linear feature extractors based on mutual information , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[17]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[18]  Robert P. W. Duin,et al.  On Combining Dissimilarity Representations , 2001, Multiple Classifier Systems.

[19]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[20]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[21]  Josef Kittler,et al.  Improving Product by Moderating k-NN Classifiers , 2001, Multiple Classifier Systems.

[22]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[23]  Nathan Intrator,et al.  Automatic model selection in a hybrid perceptron/radial network , 2001, Inf. Fusion.

[24]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[25]  Rafael A. Calvo,et al.  Fast Dimensionality Reduction and Simple PCA , 1998, Intell. Data Anal..

[26]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[28]  Michael C. Fairhurst,et al.  Genetic Algorithms for Multi-classifier System Configuration: A Case Study in Character Recognition , 2001, Multiple Classifier Systems.

[29]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[30]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[31]  Pedro M. Domingos Control-Sensitive Feature Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[32]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[33]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  David G. Stork,et al.  Pattern Classification , 1973 .

[35]  Kagan Tumer,et al.  Robust Order Statistics Based Ensembles for Distributed Data Mining , 2001 .

[36]  David W. Opitz,et al.  Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[37]  Kagan Tumer,et al.  Estimating the Bayes error rate through classifier combining , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[38]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[39]  Philippe Tchamitchian,et al.  Wavelets : time-frequency methods and phase space : proceedings of the international conference, Marseille, France, December 14-18, 1987 , 1989 .

[40]  Ludmila I. Kuncheva,et al.  Feature Subsets for Classifier Combination: An Enumerative Experiment , 2001, Multiple Classifier Systems.

[41]  David Windridge,et al.  Classifier Combination as a Tomographic Process , 2001, Multiple Classifier Systems.

[42]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[43]  Nanda Kambhatla,et al.  Fast Non-Linear Dimension Reduction , 1993, NIPS.

[44]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[45]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[46]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[48]  Robert P. W. Duin,et al.  Sammon's mapping using neural networks: A comparison , 1997, Pattern Recognit. Lett..

[49]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[50]  D. Middleton,et al.  On the classification of underwater acoustic signals. I. An environmentally adaptive approach , 1975 .

[51]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[52]  Thomas Martini Jørgensen,et al.  Feature Weighted Ensemble Classifiers - A Modified Decision Scheme , 2001, Multiple Classifier Systems.

[53]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[54]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[55]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[56]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[57]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[58]  Ph. Tchamitchian,et al.  Wavelets: Time-Frequency Methods and Phase Space , 1992 .

[59]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[60]  Kagan Tumer,et al.  Input Decimation Ensembles: Decorrelation through Dimensionality Reduction , 2001, Multiple Classifier Systems.

[61]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[62]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[63]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[64]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[65]  Tin Kam Ho Data Complexity Analysis for Classifier Combination , 2001, Multiple Classifier Systems.

[66]  Nikunj C. Oza,et al.  Decimated input ensembles for improved generalization , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[67]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[68]  Josef Kittler,et al.  Relationship of Sum and Vote Fusion Strategies , 2001, Multiple Classifier Systems.

[69]  M. Field,et al.  Robust Order Statistics based Ensembles for Distributed Data Mining , 2000 .

[70]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[71]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[72]  Joydeep Ghosh,et al.  A versatile framework for labelling imagery with a large number of classes , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[73]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.