Constructing Ensembles of Classifiers by Means of Weighted Instance Selection

In this paper, we approach the problem of constructing ensembles of classifiers from the point of view of instance selection. Instance selection is aimed at obtaining a subset of the instances available for training capable of achieving, at least, the same performance as the whole training set. In this way, instance selection algorithms try to keep the performance of the classifiers while reducing the number of instances in the training set. Meanwhile, boosting methods construct an ensemble of classifiers iteratively focusing each new member on the most difficult instances by means of a biased distribution of the training instances. In this work, we show how these two methodologies can be combined advantageously. We can use instance selection algorithms for boosting using as objective to optimize the training error weighted by the biased distribution of the instances given by the boosting method. Our method can be considered as boosting by instance selection. Instance selection has mostly been developed and used for k -nearest neighbor (k -NN) classifiers. So, as a first step, our methodology is suited to construct ensembles of k -NN classifiers. Constructing ensembles of classifiers by means of instance selection has the important feature of reducing the space complexity of the final ensemble as only a subset of the instances is selected for each classifier. However, the methodology is not restricted to k-NN classifier. Other classifiers, such as decision trees and support vector machines (SVMs), may also benefit from a smaller training set, as they produce simpler classifiers if an instance selection algorithm is performed before training. In the experimental section, we show that the proposed approach is able to produce better and simpler ensembles than random subspace method (RSM) method for k-NN and standard ensemble methods for C4.5 and SVMs.

[1]  Jae-Yearn Kim,et al.  Data Reduction for Instance-Based Learning Using Entropy-Based Partitioning , 2006, ICCSA.

[2]  Loris Nanni,et al.  Evolved Feature Weighting for Random Subspace Classifier , 2008, IEEE Transactions on Neural Networks.

[3]  Sushil J. LouisDepartment Combining Robot Control Strategies Using Genetic Algorithms with Memory , 1997 .

[4]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[5]  Hisao Ishibuchi,et al.  Multi-objective pattern and feature selection by a genetic algorithm , 2000, GECCO.

[6]  Jon Patrick,et al.  Meta-Learning Orthographic and Contextual Models for Language Independent Named Entity Recognition , 2003, CoNLL.

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[11]  Terry Windeatt,et al.  Accuracy/Diversity and Ensemble MLP Classifier Design , 2006, IEEE Transactions on Neural Networks.

[12]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[13]  Hisao Ishibuchi,et al.  Learning of neural networks with GA-based instance selection , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[14]  Kyoung-jae Kim Artificial neural networks with evolutionary instance selection for financial forecasting , 2006, Expert Syst. Appl..

[15]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[16]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[20]  Hisao Ishibuchi,et al.  Pattern and Feature Selection by Genetic Algorithms in Nearest Neighbor Classification , 2000, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[21]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[22]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[23]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[24]  Heikki Mannila,et al.  The power of sampling in knowledge discovery , 1994, PODS '94.

[25]  A. Winsor Sampling techniques. , 2000, Nursing times.

[26]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[27]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[28]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[29]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[30]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[31]  Hung-Ming Chen,et al.  Design of nearest neighbor classifiers: multi-objective approach , 2005, Int. J. Approx. Reason..

[32]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[33]  David W. Aha,et al.  A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations , 1990 .

[34]  Norbert Krüger,et al.  Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[35]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[36]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[37]  Robert P. W. Duin,et al.  Bagging and the Random Subspace Method for Redundant Feature Spaces , 2001, Multiple Classifier Systems.

[38]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[39]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[40]  Tin Kam Ho,et al.  Large-Scale Simulation Studies in Image Pattern Recognition , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[42]  C. Brodley Recursive Automatic Bias Selection for Classifier Construction , 2004, Machine Learning.

[43]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[44]  Colin R. Reeves,et al.  Using Genetic Algorithms for Training Data Selection in RBF Networks , 2001 .

[45]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[46]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[47]  Eugene M. Kleinberg,et al.  On the Algorithmic Implementation of Stochastic Discrimination , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[49]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[50]  Feng Chu,et al.  A General Wrapper Approach to Selection of Class-Dependent Features , 2008, IEEE Transactions on Neural Networks.

[51]  Huan Liu,et al.  On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.

[52]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[53]  L. Kuncheva,et al.  Combining classifiers: Soft computing solutions. , 2001 .

[54]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[55]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[56]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[57]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[58]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[59]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[60]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[61]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[62]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[63]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[64]  Dimitrios Gunopulos,et al.  Large margin nearest neighbor classifiers , 2005, IEEE Transactions on Neural Networks.

[65]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[66]  Gan Li,et al.  Combining Control Strategies Using Genetic Algorithms with Memory , 1997, Evolutionary Programming.

[67]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[68]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[69]  Lawrence O. Hall,et al.  Comparing pure parallel ensemble creation techniques against bagging , 2003, Third IEEE International Conference on Data Mining.