FEATURE SELECTION METHODS AND ALGORITHMS

Feature selection is an important topic in data mining, especially for high dimensional datasets. Feature selection (also known as subset selection) is a process commonly used in machine learning, wherein subsets of the features available from the data are selected for application of a learning algorithm. The best subset contains the least number of dimensions that most contribute to accuracy; we discard the remaining, unimportant dimensions. This is an important stage of preprocessing and is one of two ways of avoiding the curse of dimensionality (the other is feature extraction). There are two approaches in Feature selection known as Forward selection and backward selection. Feature selection has been an active research area in pattern recognition, statistics, and data mining communities. The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information. Feature selection methods can be decomposed into three broad classes. One is Filter methods and another one is Wrapper method and the third one is Embedded method. This paper presents an empirical comparison of feature selection methods and its algorithms. In view of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enable to adequately decide which algorithm to use in certain situations. This work reviews several fundamental algorithms found in the literature and assesses their performance in a controlled scenario. KeywordsFeature Selection, Feature Selection Methods, Feature Selection Algorithms.

[1]  R. Bhaskaran,et al.  A Study on Feature Selection Techniques in Educational Data Mining , 2009, ArXiv.

[2]  Shailendra Singh,et al.  An ensemble approach for feature selection of Cyber Attack Dataset , 2009, ArXiv.

[3]  Hamid Reza Pourreza,et al.  Efficient IRIS Recognition through Improvement of Feature Extraction and subset Selection , 2009, ArXiv.

[4]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[5]  Sanjay Ranka,et al.  Classification and feature selection algorithms for multi-class CGH data , 2008, ISMB.

[6]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[7]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[8]  Bo Yang,et al.  Feature selection and classification using flexible neural tree , 2006, Neurocomputing.

[9]  Bo Zhang,et al.  Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[10]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[11]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[12]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[13]  H.S. Lopes,et al.  A parallel genetic algorithm for rule discovery in large databases , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[14]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[15]  A. Karegowda,et al.  COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION , 2010 .

[16]  Yuval Peress Multi-core Design and Memory Feature Selection Survey , 2009 .

[17]  J. Mennicke Classifier Learning for Imbalanced Data with Varying Misclassification Costs , 2006 .

[18]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[19]  Filippo Menczer,et al.  Feature selection in data mining , 2003 .

[20]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.