New approach for feature selection based on rough set and bat algorithm

This paper presents a new feature selection technique based on rough sets and bat algorithm (BA). BA is attractive for feature selection in that bats will discover best feature combinations as they fly within the feature subset space. Compared with GAs, BA does not need complex operators such as crossover and mutation, it requires only primitive and simple mathematical operators, and is computationally inexpensive in terms of both memory and runtime. A fitness function based on rough-sets is designed as a target for the optimization. The used fitness function incorporates both the classification accuracy and number of selected features and hence balances the classification performance and reduction size. This paper make use of four initialisation strategies for starting the optimization and studies its effect on bat performance. The used initialization reflects forward and backward feature selection and combination of both. Experimentation is carried out using UCI data sets which compares the proposed algorithm with a GA-based and PSO approaches for feature reduction based on rough-set algorithms. The results on different data sets shows that bat algorithm is efficient for rough set-based feature selection. The used rough-set based fitness function ensures better classification result keeping also minor feature size.

[1]  Richard Jensen,et al.  Combining rough and fuzzy sets for feature selection , 2004 .

[2]  Xin-She Yang,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[3]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[4]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[5]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[6]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[7]  Eduardo Gasca,et al.  Eliminating redundancy and irrelevance using a new MLP-based feature selection method , 2006, Pattern Recognit..

[8]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[9]  Laura Maria Cannas A framework for feature selection in high-dimensional domains , 2013 .

[10]  Qinglin Guo,et al.  Implement web learning environment based on data mining , 2009, Knowl. Based Syst..

[11]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[12]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[13]  WesselsLodewyk,et al.  Random subspace method for multivariate feature selection , 2006 .

[14]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[15]  Li Pheng Khoo,et al.  Feature extraction using rough set theory and genetic algorithms--an application for the simplification of product quality evaluation , 2002 .

[16]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[17]  Y. Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .

[18]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[19]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[20]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Wei-Zhi Wu,et al.  Approaches to knowledge reduction based on variable precision rough set model , 2004, Inf. Sci..

[22]  Marcel J. T. Reinders,et al.  Random subspace method for multivariate feature selection , 2006, Pattern Recognit. Lett..

[23]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[24]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[25]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[26]  Li Cheng,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010 .

[27]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[28]  Qinghua Hu,et al.  Improved Feature Selection Algorithm Based on SVM and Correlation , 2006, ISNN.

[29]  LiuHuan,et al.  Consistency-based search in feature selection , 2003 .

[30]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, RSFDGrC.