论文信息 - On the use of meta-learning for instance selection: An architecture and an experimental study

On the use of meta-learning for instance selection: An architecture and an experimental study

Many authors agree that, when applying instance selection to a data set, it would be useful to characterize the data set in order to choose the most suitable selection criterion. Based on this hypothesis, we propose an architecture for knowledge-based instance selection (KBIS) systems. It uses meta-learning to select the best suited instance selection method for each specific database, among several methods available. We carried out a study in order to verify whether this architecture can outperform the individual methods. Two different versions of a KBIS system based on our architecture, each using a different learner, were instantiated. They were evaluated experimentally and the results were compared to those of the individual methods used.

[1] Antonio González Muñoz,et al. Combining instance selection methods based on data characterization: An approach to increase their effectiveness , 2011, Inf. Sci..

[2] Chin-Liang Chang,et al. Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[3] José Ramón Cano,et al. Diagnose Effective Evolutionary Prototype Selection Using an Overlapping Measure , 2009, Int. J. Pattern Recognit. Artif. Intell..

[4] José Martínez Sotoca,et al. A meta-learning framework for pattern classification by means of data complexity measures , 2006, Inteligencia Artif..

[5] Tony R. Martinez,et al. Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[6] David W. Aha,et al. Instance-Based Learning Algorithms , 1991, Machine Learning.

[7] Francisco Herrera,et al. A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..

[8] Dennis L. Wilson,et al. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[9] Belur V. Dasarathy,et al. Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design , 1994, IEEE Trans. Syst. Man Cybern..

[10] Donghai Guan,et al. Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[11] Francisco Herrera,et al. Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[12] Raúl Rojas,et al. Neural Networks - A Systematic Introduction , 1996 .

[13] Francisco Herrera,et al. Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection , 2012, Inf. Sci..

[14] Rm Cameron-Jones,et al. Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[15] Yu-Lin He,et al. NRMCS : Noise removing based on the MCS , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[16] Chih-Jen Lin,et al. Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[17] Shuigeng Zhou,et al. C-pruner: an improved instance pruning algorithm , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[18] G. McLachlan. Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[19] Engelbert Mephu Nguifo,et al. Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts , 2013, Inf. Sci..

[20] Francisco Herrera,et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] David A. Elizondo,et al. Linear separability and classification complexity , 2012, Expert Syst. Appl..

[22] Nicolás García-Pedrajas,et al. A cooperative coevolutionary algorithm for instance selection for instance-based learning , 2010, Machine Learning.

[23] Francisco Herrera,et al. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24] David B. Skalak,et al. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[25] Luciano Sánchez. A random sets-based method for identifying fuzzy models , 1998, Fuzzy Sets Syst..

[26] G. Gates,et al. The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[27] Kate Smith-Miles,et al. Meta-learning for data summarization based on instance selection method , 2010, IEEE Congress on Evolutionary Computation.

[28] Q. Henry Wu,et al. Instance Seriation for Prototype Abstraction , 2010, 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA).

[29] Philip S. Yu,et al. Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[30] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[31] Amir F. Atiya,et al. Self-generating prototypes for pattern classification , 2007, Pattern Recognit..

[32] Mario A. Muñoz,et al. A Meta-learning Prediction Model of Algorithm Performance for Continuous Optimization Problems , 2012, PPSN.

[33] Chris Mellish,et al. Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[34] Cor J. Veenman,et al. The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Ian H. Witten,et al. Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[36] Fabrizio Angiulli,et al. Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[37] Jesús Alcalá-Fdez,et al. Local identification of prototypes for genetic learning of accurate TSK fuzzy rule‐based systems , 2007, Int. J. Intell. Syst..

[38] Francisco Herrera,et al. A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[39] Elena Marchiori,et al. Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Sang-Woon Kim,et al. Creative prototype reduction schemes: a taxonomy and ranking , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[41] Kate Smith-Miles,et al. Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[42] Miguel Toro,et al. Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[43] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[44] José Francisco Martínez Trinidad,et al. A review of instance selection methods , 2010, Artificial Intelligence Review.

[45] Johan A. K. Suykens,et al. Load forecasting using a multivariate meta-learning system , 2013, Expert Syst. Appl..

[46] Peter E. Hart,et al. The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[47] María José del Jesús,et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[48] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[49] Francisco Herrera,et al. Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[50] Jesús Alcalá-Fdez,et al. Local identification of prototypes for genetic learning of accurate TSK fuzzy rule-based systems: Research Articles , 2007 .

[51] G. Gates. The Reduced Nearest Neighbor Rule , 1998 .

[52] Thomas Reinartz,et al. A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[53] Miguel Toro,et al. Data set Editing by Ordered Projection , 2000, Intell. Data Anal..

[54] Tin Kam Ho,et al. Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[55] B. John Oommen,et al. Enhancing prototype reduction schemes with LVQ3-type algorithms , 2003, Pattern Recognit..

[56] Hongbin Zhang,et al. Optimal reference subset selection for nearest neighbor classification by tabu search , 2002, Pattern Recognit..

[57] Hugh B. Woodruff,et al. An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.