A literature review of feature selection techniques and applications: Review of feature selection in data mining

Water is the elixir of life. It is a vital component of human survival. Water should be purified for better and healthy style life of all living and non-living things. The quality of water plays an important role for all living beings. Water used for drinking purpose should be colourless, odourless and free from excess salts. Detecting such a variety of contamination from the drinking water becomes a challenging task. Feature selection acts as a significant role in identifying irrelevant features and redundant features from large dataset. Feature selection is a preprocessing course of action universally used for large amount of data. Feature selection concepts instruct us, to pick a subset of features or catalog of attribute or variables which helps to build an efficient model for describing the selected subset. Other than selecting the subset, it also congregate some other purposes, such as dimensionality reduction, compact the amount of data which are required for learning process, progress in predictive accuracy and increasing the constructed models. The main aim of this work is to investigate about the concept of feature selection, various criterions of feature selection methods and some existing methods are discussed from 1997 till 2014 and address the issues and challenges of feature selection.

[1]  Dunja Mladenic,et al.  Feature selection on hierarchy of web documents , 2003, Decis. Support Syst..

[2]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[3]  Shuo-Yan Chou,et al.  Enhancing the classification accuracy by scatter-search-based ensemble approach , 2011, Appl. Soft Comput..

[4]  Hiok Chai Quek,et al.  MCES: A Novel Monte Carlo Evaluative Selection Approach for Objective Feature Selections , 2007, IEEE Transactions on Neural Networks.

[5]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[6]  E. Talbi,et al.  A Genetic Algorithm for Feature Selection in Data-Mining for Genetics , 2001 .

[7]  H. Zheng,et al.  Feature selection for high dimensional data in astronomy , 2007, 0709.0138.

[8]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[9]  Riyaz Sikora,et al.  Efficient Genetic Algorithm Based Data Mining Using Feature Selection with Hausdorff Distance , 2005, Inf. Technol. Manag..

[10]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[11]  V.Yogam,et al.  Automatic Annotation Wrapper Generationand Mining Web Database Search Result , 2014 .

[12]  Xiaohua Hu,et al.  Data Mining via Discretization, Generalization and Rough Set Feature Selection , 1999, Knowledge and Information Systems.

[13]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[14]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[15]  Sejong Oh,et al.  RFS: Efficient feature selection method based on R-value , 2013, Comput. Biol. Medicine.

[16]  Rich Caruana,et al.  On Feature Selection, Bias-Variance, and Bagging , 2009, ECML/PKDD.

[17]  Bin Zhang,et al.  ALGORITHM OF FEATURE SELECTION FOR INCONSISTENT DATA PREPROCESSING BASED ROUGH SET , 2005 .

[18]  Nikolaos Kourentzes,et al.  Feature selection for time series prediction - A combined filter and wrapper approach for neural networks , 2010, Neurocomputing.

[19]  Sunita Beniwal,et al.  Classification and Feature Selection Techniques in Data Mining , 2012 .

[20]  Gang Chen,et al.  An Improved Feature Selection Algorithm Based on Parzen Window and Conditional Mutual Information , 2013 .

[21]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[22]  Bernhard Sick,et al.  Evolutionary optimization of radial basis function classifiers for data mining applications , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Hui Li,et al.  Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine , 2014, Appl. Soft Comput..

[24]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[25]  Lan Bai,et al.  A novel feature selection method for twin support vector machine , 2014, Knowl. Based Syst..