NEC: A nested equivalence class-based dependency calculation approach for fast feature selection using rough set theory

Abstract Feature selection plays an important role in data mining and machine learning tasks. As one of the most effective methods for feature selection, rough set theory provides a systematic theoretical framework for consistency-based feature selection, in which positive region-based dependency calculation is the most important step. However, it is time-consuming, and although many improved algorithms have been proposed, they are still computationally time-consuming. Therefore, to overcome this shortcoming, in this study, a nested equivalence class (NEC) approach is introduced to calculate dependency. The proposed method starts from the finest partition of the universe, and then extracts and uses the known knowledge of reducts in a decision table to construct an NEC. The proposed method not only simplifies dependency calculation but also reduces the universe correspondingly, in most cases. Using the proposed NEC-based approach, a number of representative heuristic- and swarm intelligence-based feature selection algorithms that apply rough set theory were enhanced. Note that the feature subset selected by each modified algorithm and that selected by the original algorithm were the same. Experiments conducted using 33 datasets from the UCI repository and KDD Cup competition, which included large-scale and high-dimensional datasets, demonstrated the efficiency and effectiveness of the proposed method.

[1]  Laith Mohammad Abualigah,et al.  Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering , 2017, The Journal of Supercomputing.

[2]  Hao Liao,et al.  An efficient semi-supervised representatives feature selection algorithm based on information theory , 2017, Pattern Recognit..

[3]  Ahmad Taher Azar,et al.  A novel hybrid feature selection method based on rough set and improved harmony search , 2015, Neural Computing and Applications.

[4]  Usman Qamar,et al.  A parallel rough set based dependency calculation method for efficient feature selection , 2017, Appl. Soft Comput..

[5]  Chao Li,et al.  Optimization of a heliostat field layout using hybrid PSO-GA algorithm , 2018 .

[6]  Witold Pedrycz,et al.  Positive approximation: An accelerator for attribute reduction in rough set theory , 2010, Artif. Intell..

[7]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[8]  Usman Qamar,et al.  Feature selection using rough set-based direct dependency calculation by avoiding the positive region , 2018, Int. J. Approx. Reason..

[9]  Jiye Liang,et al.  Converse approximation and rule extraction from decision tables in rough set theory , 2008, Comput. Math. Appl..

[10]  Wei Li,et al.  Mountain railway alignment optimization using stepwise & hybrid particle swarm optimization incorporating genetic operators , 2019, Appl. Soft Comput..

[11]  Usman Qamar,et al.  A heuristic based dependency calculation technique for rough set theory , 2018, Pattern Recognit..

[12]  Jesús González,et al.  A new multi-objective wrapper method for feature selection - Accuracy and stability analysis for BCI , 2019, Neurocomputing.

[13]  Seyed Mohammad Mirjalili,et al.  Whale optimization approaches for wrapper feature selection , 2018, Appl. Soft Comput..

[14]  XIAOHUA Hu,et al.  LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[15]  D. A. Bell,et al.  Rough Computational Methods for Information , 1998, Artif. Intell..

[16]  Jiye Liang,et al.  International Journal of Approximate Reasoning an Efficient Rough Feature Selection Algorithm with a Multi-granulation View , 2022 .

[17]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[18]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[19]  Siddhartha Bhattacharyya,et al.  A group incremental feature selection for classification using rough set theory based genetic algorithm , 2018, Appl. Soft Comput..

[20]  Zhongzhi Shi,et al.  A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets , 2009, Inf. Sci..

[21]  Haider Banka,et al.  A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation , 2015, Pattern Recognit. Lett..

[22]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[23]  Richard Jensen,et al.  Towards scalable fuzzy-rough feature selection , 2015, Inf. Sci..

[24]  Dae-Won Kim,et al.  Memetic feature selection for multilabel text categorization using label frequency difference , 2019, Inf. Sci..

[25]  Chee Peng Lim,et al.  An artificial bee colony algorithm with a Modified Choice Function for the traveling salesman problem , 2019, Swarm Evol. Comput..

[26]  Chen Degang,et al.  A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets , 2007 .

[27]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[28]  Richard Jensen,et al.  Unsupervised fuzzy-rough set-based dimensionality reduction , 2013, Inf. Sci..

[29]  Francisco Herrera,et al.  Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection , 2012, Inf. Sci..

[30]  Jiye Liang,et al.  Compacted decision tables based attribute reduction , 2015, Knowl. Based Syst..

[31]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[32]  Xuelong Li,et al.  Unsupervised Feature Selection with Structured Graph Optimization , 2016, AAAI.

[33]  Qiang Shen,et al.  Finding rough and fuzzy-rough set reducts with SAT , 2014, Inf. Sci..

[34]  Usman Qamar,et al.  An incremental dependency calculation technique for feature selection using rough sets , 2016, Inf. Sci..

[35]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[36]  Yu Xue,et al.  A hybrid feature selection algorithm for gene expression data classification , 2017, Neurocomputing.

[37]  Degang Chen,et al.  Generalized dominance rough set models for the dominance intuitionistic fuzzy information systems , 2017, Inf. Sci..

[38]  Marcos André Gonçalves,et al.  A Genetic Programming approach for feature selection in highly dimensional skewed data , 2018, Neurocomputing.

[39]  Yumin Chen,et al.  Finding rough set reducts with fish swarm algorithm , 2015, Knowl. Based Syst..

[40]  Francisco Herrera,et al.  Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "RoughSets" , 2014, Inf. Sci..

[41]  Hamido Fujita,et al.  An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view , 2017, Inf. Sci..

[42]  Wei Wei,et al.  Accelerating incremental attribute reduction algorithm by compacting a decision table , 2018, Int. J. Mach. Learn. Cybern..

[43]  Ahmad Taher Azar,et al.  Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis , 2014, Comput. Methods Programs Biomed..

[44]  Yaojin Lin,et al.  Matrix-based set approximations and reductions in covering decision information systems , 2015, Int. J. Approx. Reason..

[45]  Lin Sun,et al.  Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification , 2019, Inf. Sci..

[46]  Qi Wang,et al.  Nonnegative Laplacian embedding guided subspace learning for unsupervised feature selection , 2019, Pattern Recognit..

[47]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.