Feature Selection: A Practitioner View

Feature selection is one of the most important preprocessing steps in data mining and knowledge Engineering. In this short review paper, apart from a brief taxonomy of current feature selection methods, we review feature selection methods that are being used in practice. Subsequently we produce a near comprehensive list of problems that have been solved using feature selection across technical and commercial domain. This can serve as a valuable tool to practitioners across industry and academia. We also present empirical results of filter based methods on various datasets. The empirical study covers task of classification, regression, text classification and clustering respectively. We also compare filter based ranking methods using rank correlation.

[1]  Kenneth Revett,et al.  Feature selection in Parkinson's disease: A rough sets approach , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Yuh-Shan Ho,et al.  Bibliometric analysis on global Parkinson's disease research trends during 1991–2006 , 2008, Neuroscience Letters.

[4]  K. Chou,et al.  Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods , 2011, PloS one.

[5]  Chih-Fong Tsai,et al.  Feature selection in bankruptcy prediction , 2009, Knowl. Based Syst..

[6]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[7]  Varun Aggarwal,et al.  Feature Selection and Dimension Reduction Techniques in SAS , 2011 .

[8]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[10]  Chih-Fong Tsai,et al.  Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches , 2010, Decis. Support Syst..

[11]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[12]  Shaogang Gong,et al.  Feature selection on Gait Energy Image for human identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  José Manuel Benítez,et al.  Empirical study of feature selection methods based on individual feature evaluation for classification problems , 2011, Expert Syst. Appl..

[15]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[16]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[17]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[18]  Yi-Ping Phoebe Chen,et al.  Acoustic feature selection for automatic emotion recognition from speech , 2009, Inf. Process. Manag..

[19]  Mário A. T. Figueiredo,et al.  An unsupervised approach to feature discretization and selection , 2012, Pattern Recognit..

[20]  Nasser Yazdani,et al.  Mutual information-based feature selection for intrusion detection systems , 2011, J. Netw. Comput. Appl..

[21]  J. Ramírez,et al.  SVM-based computer-aided diagnosis of the Alzheimer's disease using t-test NMSE feature selection with feature correlation weighting , 2009, Neuroscience Letters.

[22]  Sam Kwong,et al.  Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection , 2007, Pattern Recognition.

[23]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[24]  Jing Zhang,et al.  Assessment of world aerosol research trends by bibliometric analysis , 2008, Scientometrics.

[25]  Ayşegül Uçar,et al.  Wavelet-based feature extraction and selection for classification of power system disturbances using support vector machines , 2010 .

[26]  Tshilidzi Marwala,et al.  Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection , 2012 .

[27]  Cheng-Lung Huang,et al.  A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting , 2009, Expert Syst. Appl..

[28]  Saptarsi Goswami,et al.  Empirical Study on Filter based Feature Selection Methods for Text Classification , 2013 .

[29]  Saroj Kumar Pradhan,et al.  Cascaded Factor Analysis and Wavelet Transform Method for Tumor Classification Using Gene Expression Data , 2012 .

[30]  Ashok Ghatol,et al.  Feature selection for medical diagnosis : Evaluation for cardiovascular diseases , 2013, Expert Syst. Appl..

[31]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[32]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[33]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[34]  Slobodan Petrovic,et al.  Improving Effectiveness of Intrusion Detection by Correlation Feature Selection , 2010, ARES.

[35]  Qing Li,et al.  Finding Relevant Papers Based on Citation Relations , 2011, WAIM.

[36]  Ming-Chi Lee,et al.  Using support vector machine with a hybrid feature selection method to the stock trend prediction , 2009, Expert Syst. Appl..

[37]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[38]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.