A Feature Subset Selection Algorithm Automatic Recommendation Method

Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.

[1]  Belén Melián-Batista,et al.  Solving feature subset selection problem by a Parallel Scatter Search , 2006, Eur. J. Oper. Res..

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[4]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[5]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[6]  Masao Fukushima,et al.  Tabu search for attribute reduction in rough set theory , 2008, Soft Comput..

[7]  Qinbao Song,et al.  Automatic recommendation of classification algorithms based on data set characteristics , 2012, Pattern Recognit..

[8]  Pavel Pudil,et al.  Feature Selection Expert - User Oriented Approach: Methodology and Concept of the System , 1998, SSPR/SPR.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[11]  So Young Sohn,et al.  Meta Analysis of Classification Algorithms for Pattern Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[15]  D. Wolpert The Supervised Learning No-Free-Lunch Theorems , 2002 .

[16]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[17]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[18]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[19]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[20]  Giovanna Castellano,et al.  Meta-data: Characterization of Input Features for Meta-learning , 2005, MDAI.

[21]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[22]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[23]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[24]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[25]  Alexander Schnabl,et al.  Towards the Personalization of Algorithms Evaluation in Data Mining , 1998, KDD.

[26]  Pavel Pudil,et al.  Conceptual base of feature selection consulting system , 1998, Kybernetika.

[27]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[28]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[29]  Robert Engels,et al.  Using a Data Metric for Preprocessing Advice for Data Mining Applications , 1998, ECAI.

[30]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[31]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[32]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[33]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[34]  João Gama,et al.  On Data and Algorithms: Understanding Inductive Performance , 2004, Machine Learning.

[35]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[36]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[37]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[38]  Jerffeson Teixeira de Souza,et al.  Feature selection with a general hybrid algorithm , 2004 .

[39]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[40]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[41]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[42]  Rudi Studer,et al.  AST: Support for Algorithm Selection with a CBR Approach , 1999, PKDD.

[43]  Bundoora Vie A HEURISTIC - STATISTICAL FEATURE SELECTION CRITERION FOR INDUCTIVE MACHINE LEARNING IN THE REAL WORLD , 1988 .

[44]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[45]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[46]  Eibe Frank,et al.  Large-scale attribute selection using wrappers , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[47]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[48]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[49]  Hongjun Lu,et al.  Cleansing Data for Mining and Warehousing , 1999, DEXA.

[50]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[51]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[52]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[53]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[54]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[55]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[56]  Alexander Schnabl,et al.  Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms , 1997, KDD.

[57]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .