A new approach to exploring rough set boundary region for feature selection

Feature selection offers a crucial way to reduce the irrelevant and misleading features for a given problem, while retaining the underlying semantics of selected features. Whilst maintaining the quality of problem-solving (e.g., classification), a superior feature selection process should be reduce the number of attributes as much as possible. In this paper, a non-unique decision value (NDV), which is defined as the number of attribute values that can lead to non-unique decision values, is proposed to rapidly capture the uncertainty in the boundary region of a granular space. Also, as an evaluator of the selected feature subset, an NDV-based differentiation entropy (NDE) is introduced to implement a novel feature selection process. The experimental results demonstrate that the selected features by the proposed approach outperform those attained by other state-of-the-art feature selection methods, in respect of both the size of reduction and the classification accuracy.

[1]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[2]  Qiang Shen,et al.  Rough set-aided keyword reduction for text categorization , 2001, Appl. Artif. Intell..

[3]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[4]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[5]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[6]  Lin Sun,et al.  Feature selection using rough entropy-based uncertainty measures in incomplete decision systems , 2012, Knowl. Based Syst..

[7]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[8]  Qiang Shen,et al.  Exploring the boundary region of tolerance rough sets for feature selection , 2009, Pattern Recognit..

[9]  Jun Fang,et al.  A fast feature selection approach based on rough set boundary regions , 2014, Pattern Recognit. Lett..

[10]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  XIAOHUA Hu,et al.  LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[13]  Lindsay I. Smith,et al.  A tutorial on Principal Components Analysis , 2002 .

[14]  Usman Qamar,et al.  An incremental dependency calculation technique for feature selection using rough sets , 2016, Inf. Sci..

[15]  Karim Faez,et al.  An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system , 2008, Appl. Math. Comput..

[16]  Mohammad Saniee Abadeh,et al.  Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function , 2013, Eng. Appl. Artif. Intell..

[17]  Qiang Shen,et al.  Are More Features Better? A Response to Attributes Reduction Using Fuzzy Rough Sets , 2009, IEEE Transactions on Fuzzy Systems.

[18]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[19]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[20]  Chenxia Jin,et al.  Feature selection with partition differentiation entropy for large-scale data sets , 2016, Inf. Sci..

[21]  Cheng-Lung Huang,et al.  A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting , 2009, Expert Syst. Appl..