A New Fuzzy-Rough Hybrid Merit to Feature Selection

Feature selecting is considered as one of the most important pre-process methods in machine learning, data mining and bioinformatics. By applying pre-process techniques, we can defy the curse of dimensionality by reducing computational and storage costs, facilitate data understanding and visualization, and diminish training and testing times, leading to overall performance improvement, especially when dealing with large datasets. Correlation feature selection method uses a conventional merit to evaluate different feature subsets. In this paper, we propose a new merit by adapting and employing of correlation feature selection in conjunction with fuzzy-rough feature selection, to improve the effectiveness and quality of the conventional methods. It also outperforms the newly introduced gradient boosted feature selection, by selecting more relevant and less redundant features. The two-step experimental results show the applicability and efficiency of our proposed method over some well known and mostly used datasets, as well as newly introduced ones, especially from the UCI collection with various sizes from small to large numbers of features and samples.

[1]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[2]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[3]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[4]  Francesca Odone,et al.  Feature selection for high-dimensional data , 2009, Comput. Manag. Sci..

[5]  Chang Wook Ahn,et al.  Novel Improvements on the Fuzzy-Rough QuickReduct Algorithm , 2015, IEICE Trans. Inf. Syst..

[6]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[7]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[8]  Jacek M. Zurada,et al.  Identification of Full and Partial Class Relevant Genes , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Kashif Javed,et al.  Feature Selection Based on Class-Dependent Densities for High-Dimensional Binary Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Chris Cornelis,et al.  Semi-Supervised Fuzzy-Rough Feature Selection , 2015, RSFDGrC.

[11]  Marek Lubicz,et al.  Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients , 2014, Appl. Soft Comput..

[12]  Kilian Q. Weinberger,et al.  Gradient boosted feature selection , 2014, KDD.

[13]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[14]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[15]  S. Manikandan,et al.  Measures of central tendency: The mean , 2011, Journal of pharmacology & pharmacotherapeutics.

[16]  R. Detrano,et al.  International application of a new probability algorithm for the diagnosis of coronary artery disease. , 1989, The American journal of cardiology.

[17]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[18]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[19]  Javad Rahimipour Anaraki,et al.  Improving fuzzy-rough quick reduct for feature selection , 2011, 2011 19th Iranian Conference on Electrical Engineering.

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  Łukasz Wróbel,et al.  Application of Rule Induction Algorithms for Analysis of Data Collected by Seismic Hazard Monitoring Systems in Coal Mines , 2010 .

[22]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[25]  Max A. Little,et al.  Objective Automatic Assessment of Rehabilitative Speech Treatment in Parkinson's Disease , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[26]  Donald D. Lucas,et al.  Failure analysis of parameter-induced simulation crashes in climate models , 2013 .

[27]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[28]  Hyeoncheol Kim,et al.  An MLP-based feature subset selection for HIV-1 protease cleavage site analysis , 2010, Artif. Intell. Medicine.

[29]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[30]  Changjing Shang,et al.  Fuzzy-rough feature selection aided support vector machines for Mars image classification , 2013, Comput. Vis. Image Underst..

[31]  Francisco Herrera,et al.  On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection , 2012, Soft Computing.

[32]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[33]  Ryszard S. Michalski,et al.  Comparing Symbolic and Subsymbolic Learning: Three Studies , 1992 .