On the use of meta-learning for instance selection: An architecture and an experimental study

Many authors agree that, when applying instance selection to a data set, it would be useful to characterize the data set in order to choose the most suitable selection criterion. Based on this hypothesis, we propose an architecture for knowledge-based instance selection (KBIS) systems. It uses meta-learning to select the best suited instance selection method for each specific database, among several methods available. We carried out a study in order to verify whether this architecture can outperform the individual methods. Two different versions of a KBIS system based on our architecture, each using a different learner, were instantiated. They were evaluated experimentally and the results were compared to those of the individual methods used.

[1]  Antonio González Muñoz,et al.  Combining instance selection methods based on data characterization: An approach to increase their effectiveness , 2011, Inf. Sci..

[2]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[3]  José Ramón Cano,et al.  Diagnose Effective Evolutionary Prototype Selection Using an Overlapping Measure , 2009, Int. J. Pattern Recognit. Artif. Intell..

[4]  José Martínez Sotoca,et al.  A meta-learning framework for pattern classification by means of data complexity measures , 2006, Inteligencia Artif..

[5]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[6]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[7]  Francisco Herrera,et al.  A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..

[8]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[9]  Belur V. Dasarathy,et al.  Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design , 1994, IEEE Trans. Syst. Man Cybern..

[10]  Donghai Guan,et al.  Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[11]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[12]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[13]  Francisco Herrera,et al.  Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection , 2012, Inf. Sci..

[14]  Rm Cameron-Jones,et al.  Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[15]  Yu-Lin He,et al.  NRMCS : Noise removing based on the MCS , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[16]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[17]  Shuigeng Zhou,et al.  C-pruner: an improved instance pruning algorithm , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[18]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[19]  Engelbert Mephu Nguifo,et al.  Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts , 2013, Inf. Sci..

[20]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David A. Elizondo,et al.  Linear separability and classification complexity , 2012, Expert Syst. Appl..

[22]  Nicolás García-Pedrajas,et al.  A cooperative coevolutionary algorithm for instance selection for instance-based learning , 2010, Machine Learning.

[23]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[25]  Luciano Sánchez A random sets-based method for identifying fuzzy models , 1998, Fuzzy Sets Syst..

[26]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[27]  Kate Smith-Miles,et al.  Meta-learning for data summarization based on instance selection method , 2010, IEEE Congress on Evolutionary Computation.

[28]  Q. Henry Wu,et al.  Instance Seriation for Prototype Abstraction , 2010, 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA).

[29]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[30]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[31]  Amir F. Atiya,et al.  Self-generating prototypes for pattern classification , 2007, Pattern Recognit..

[32]  Mario A. Muñoz,et al.  A Meta-learning Prediction Model of Algorithm Performance for Continuous Optimization Problems , 2012, PPSN.

[33]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[34]  Cor J. Veenman,et al.  The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[36]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[37]  Jesús Alcalá-Fdez,et al.  Local identification of prototypes for genetic learning of accurate TSK fuzzy rule‐based systems , 2007, Int. J. Intell. Syst..

[38]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[39]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Sang-Woon Kim,et al.  Creative prototype reduction schemes: a taxonomy and ranking , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[41]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[42]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[43]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[44]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[45]  Johan A. K. Suykens,et al.  Load forecasting using a multivariate meta-learning system , 2013, Expert Syst. Appl..

[46]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[47]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[48]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[49]  Francisco Herrera,et al.  Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[50]  Jesús Alcalá-Fdez,et al.  Local identification of prototypes for genetic learning of accurate TSK fuzzy rule-based systems: Research Articles , 2007 .

[51]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[52]  Thomas Reinartz,et al.  A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[53]  Miguel Toro,et al.  Data set Editing by Ordered Projection , 2000, Intell. Data Anal..

[54]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  B. John Oommen,et al.  Enhancing prototype reduction schemes with LVQ3-type algorithms , 2003, Pattern Recognit..

[56]  Hongbin Zhang,et al.  Optimal reference subset selection for nearest neighbor classification by tabu search , 2002, Pattern Recognit..

[57]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.