A Set of Complexity Measures Designed for Applying Meta-Learning to Instance Selection

In recent years, some authors have approached the instance selection problem from a meta-learning perspective. In their work, they try to find relationships between the performance of some methods from this field and the values of some data-complexity measures, with the aim of determining the best performing method given a data set, using only the values of the measures computed on this data. Nevertheless, most of the data-complexity measures existing in the literature were not conceived for this purpose and the feasibility of their use in this field is yet to be determined. In this paper, we revise the definition of some measures that we presented in a previous work, that were designed for meta-learning based instance selection. Also, we assess them in an experimental study involving three sets of measures, 59 databases, 16 instance selection methods, two classifiers, and eight regression learners used as meta-learners. The results suggest that our measures are more efficient and effective than those traditionally used by researchers that have addressed the instance selection from a perspective based on meta-learning.

[1]  José Martínez Sotoca,et al.  Data Characterization for Effective Prototype Selection , 2005, IbPRIA.

[2]  Thomas Reinartz,et al.  A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[3]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[6]  Antonio González Muñoz,et al.  Knowledge-based instance selection: A compromise between efficiency and versatility , 2013, Knowl. Based Syst..

[7]  B. John Oommen,et al.  Enhancing prototype reduction schemes with LVQ3-type algorithms , 2003, Pattern Recognit..

[8]  B. John Oommen,et al.  On using prototype reduction schemes to enhance the computation of volume-based inter-class overlap measures , 2009, Pattern Recognit..

[9]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[10]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[11]  Derek G. Bridge,et al.  On Dataset Complexity for Case Base Maintenance , 2011, ICCBR.

[12]  Rm Cameron-Jones,et al.  Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[13]  FRED W. SMITH,et al.  Pattern Classifier Design by Linear Programming , 1968, IEEE Transactions on Computers.

[14]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[15]  J. Rustagi Optimization Techniques in Statistics , 1994 .

[16]  Derek G. Bridge,et al.  Choosing a Case Base Maintenance Algorithm using a Meta-Case Base , 2011, SGAI Conf..

[17]  Frank Lebourgeois,et al.  Pretopological approach for supervised learning , 1996, ICPR.

[18]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[19]  Pierre A. Devijver On the editing rate of the Multiedit algorithm , 1986, Pattern Recognit. Lett..

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[22]  José Ramón Cano,et al.  Diagnose Effective Evolutionary Prototype Selection Using an Overlapping Measure , 2009, Int. J. Pattern Recognit. Artif. Intell..

[23]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[24]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[25]  Antonio González Muñoz,et al.  Combining instance selection methods based on data characterization: An approach to increase their effectiveness , 2011, Inf. Sci..

[26]  José Martínez Sotoca,et al.  A meta-learning framework for pattern classification by means of data complexity measures , 2006, Inteligencia Artif..

[27]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[28]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[29]  Nathalie Japkowicz,et al.  Instance Selection by Border Sampling in Multi-class Domains , 2009, ADMA.

[30]  Luciano Sánchez A random sets-based method for identifying fuzzy models , 1998, Fuzzy Sets Syst..

[31]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[32]  Antonio González Muñoz,et al.  On the use of meta-learning for instance selection: An architecture and an experimental study , 2014, Inf. Sci..

[33]  Kazuo Hattori,et al.  A new edited k-nearest neighbor rule in the pattern classification problem , 2000, Pattern Recognit..

[34]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[35]  Sang-Woon Kim,et al.  Creative prototype reduction schemes: a taxonomy and ranking , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[36]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[37]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[38]  L. Frank,et al.  Pretopological approach for supervised learning , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[39]  Filiberto Pla,et al.  A Stochastic Approach to Wilson's Editing Algorithm , 2005, IbPRIA.

[40]  Robert P. W. Duin,et al.  On the nonlinearity of pattern classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[41]  Kate Smith-Miles,et al.  Meta-learning for data summarization based on instance selection method , 2010, IEEE Congress on Evolutionary Computation.

[42]  Yu-Lin He,et al.  NRMCS : Noise removing based on the MCS , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[45]  Shuigeng Zhou,et al.  C-pruner: an improved instance pruning algorithm , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[46]  W. Marsden I and J , 2012 .

[47]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[48]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..