A novel community detection based genetic algorithm for feature selection

The selection of features is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features as much as possible from high-dimensional datasets. Among the main disadvantages of present meta-heuristic based approaches is that they are often neglecting the correlation between a set of selected features. In this article, for the purpose of feature selection, the authors propose a genetic algorithm based on community detection, which functions in three steps. The feature similarities are calculated in the first step. The features are classified by community detection algorithms into clusters throughout the second step. In the third step, features are picked by a genetic algorithm with a new community-based repair operation. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach. Also, the authors have compared the efficiency of the proposed approach with the findings from four available algorithms for feature selection. The findings indicate that the new approach continuously yields improved classification accuracy.

[1]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[2]  Mehrdad Rostami,et al.  A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty , 2020, J. Big Data.

[3]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[4]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[5]  Parham Moradi,et al.  A clustering based genetic algorithm for feature selection , 2014, 2014 6th Conference on Information and Knowledge Technology (IKT).

[6]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[7]  H. Parveen Sultana,et al.  Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification , 2019, Journal of Ambient Intelligence and Humanized Computing.

[8]  Yike Guo,et al.  Fast graph clustering with a new description model for community detection , 2017, Inf. Sci..

[9]  Xiaoyan Sun,et al.  Multi-objective feature selection based on artificial bee colony: An acceleration approach with variable sample size , 2020, Appl. Soft Comput..

[10]  Xingchun Diao,et al.  A Classification Method Based on Feature Selection for Imbalanced Data , 2019, IEEE Access.

[11]  Jalil Heidary Dahooie,et al.  Wrapper ANFIS-ICA method to do stock market timing and feature selection on the basis of Japanese Candlestick , 2015, Expert Syst. Appl..

[12]  Ayodele Adebiyi,et al.  PCA Model For RNA-Seq Malaria Vector Data Classification Using KNN And Decision Tree Algorithm , 2020, 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS).

[13]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[14]  Hang Lei,et al.  Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization , 2019 .

[15]  Kazeem Alagbe Gbolagade,et al.  A Hybrid Dimensionality Reduction Model for Classification of Microarray Dataset , 2017 .

[16]  Utkarsh Singh,et al.  A new optimal feature selection scheme for classification of power quality disturbances based on ant colony framework , 2019, Appl. Soft Comput..

[17]  Izabela Rejer,et al.  Gamers' involvement detection from EEG data with cGAAM - A method for feature selection for clustering , 2018, Expert Syst. Appl..

[18]  Mehrdad Rostami,et al.  Presentation of a recommender system with ensemble learning and graph embedding: a case on MovieLens , 2020, Multimedia Tools and Applications.

[19]  Cheng-Lung Huang,et al.  A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting , 2009, Expert Syst. Appl..

[20]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[21]  Marion Olubunmi Adebiyi,et al.  A Hybrid Heuristic Dimensionality Reduction Methods for Classifying Malaria Vector Gene Expression Data , 2020, IEEE Access.

[22]  M. O. Arowolo,et al.  A Comparative Analysis of Feature Extraction Methods for Classifying Colon Cancer Microarray Data , 2017, EAI Endorsed Trans. Scalable Inf. Syst..

[23]  Mário A. T. Figueiredo,et al.  An unsupervised approach to feature discretization and selection , 2012, Pattern Recognit..

[24]  Marion O. Adebiyi,et al.  Computational Investigation of Consistency and Performance of the Biochemical Network of the Malaria Parasite, Plasmodium falciparum , 2019, ICCSA.

[25]  Amr Badr,et al.  A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets , 2019, Expert Syst. Appl..

[26]  Yong Zhang,et al.  Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm , 2019, Expert Syst. Appl..

[27]  Parham Moradi,et al.  An unsupervised feature selection algorithm based on ant colony optimization , 2014, Eng. Appl. Artif. Intell..

[28]  Parham Moradi,et al.  A graph theoretic approach for unsupervised feature selection , 2015, Eng. Appl. Artif. Intell..

[29]  Francesco Marcelloni,et al.  Feature selection based on a modified fuzzy C-means algorithm with supervision , 2003, Inf. Sci..

[30]  J. K. Bertrand,et al.  The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification. , 2007, Mathematical medicine and biology : a journal of the IMA.

[31]  Jingyu Hou,et al.  Prediction optimization of diffusion paths in social networks using integration of ant colony and densest subgraph algorithms , 2020, J. High Speed Networks.

[32]  Yansen Su,et al.  A many-objective evolutionary algorithm with diversity-first based environmental selection , 2020, Swarm Evol. Comput..

[33]  Goutam Sanyal,et al.  An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning , 2018, Journal of Big Data.

[34]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[35]  Dun-Wei Gong,et al.  Feature selection of unreliable data using an improved multi-objective PSO algorithm , 2016, Neurocomputing.

[36]  Rasul Enayatifar,et al.  Frequency based feature selection method using whale algorithm. , 2019, Genomics.

[37]  Rung-Ching Chen,et al.  Selecting critical features for data classification based on machine learning methods , 2020, Journal of Big Data.

[38]  Sibel Arslan,et al.  Multi Hive Artificial Bee Colony Programming for high dimensional symbolic regression with feature selection , 2019, Appl. Soft Comput..

[39]  Seyed Taghi Akhavan Niaki,et al.  A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection , 2021, Expert Syst. Appl..

[40]  Jesús González,et al.  A new multi-objective wrapper method for feature selection - Accuracy and stability analysis for BCI , 2019, Neurocomputing.

[41]  Mohamed A. Tawhid,et al.  Hybrid Binary Bat Enhanced Particle Swarm Optimization Algorithm for solving feature selection problems , 2018 .

[42]  Ping Zhang,et al.  Feature selection considering the composition of feature relevancy , 2018, Pattern Recognit. Lett..

[43]  Nizamettin Aydin,et al.  Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. , 2019, Genomics.

[44]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[46]  Shuzhu Zhang,et al.  Swarm intelligence applied in green logistics: A literature review , 2015, Eng. Appl. Artif. Intell..

[47]  Dongqing Xie,et al.  Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis. , 2018 .

[48]  Parham Moradi,et al.  Relevance-redundancy feature selection based on ant colony optimization , 2015, Pattern Recognit..

[49]  D. Renuka Devi,et al.  Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams , 2019, Journal of Big Data.

[50]  Asgarali Bouyer,et al.  A Link-Based Similarity for Improving Community Detection Based on Label Propagation Algorithm , 2018, Journal of Systems Science and Complexity.

[51]  Tianlong Zhang,et al.  A novel hybrid feature selection strategy in quantitative analysis of laser-induced breakdown spectroscopy. , 2019, Analytica chimica acta.

[52]  Nabil Neggaz,et al.  Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection , 2020, Expert Syst. Appl..

[53]  Aboul Ella Hassanien,et al.  Binary grey wolf optimization approaches for feature selection , 2016, Neurocomputing.

[54]  Yaru Hu,et al.  A dynamic multi-objective evolutionary algorithm based on intensity of environmental change , 2020, Inf. Sci..

[55]  Saman Forouzandeh,et al.  Integration of multi-objective PSO based feature selection and node centrality for medical datasets. , 2020, Genomics.

[56]  Ghada Hany Badr,et al.  Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification , 2015, Comput. Biol. Chem..

[57]  Chang Tang,et al.  Dual graph regularized compact feature representation for unsupervised feature selection , 2019, Neurocomputing.

[58]  Milan Tuba,et al.  Classification and Feature Selection Method for Medical Datasets by Brain Storm Optimization Algorithm and Support Vector Machine , 2019, ITQM.

[59]  Jingyu Hou,et al.  Improving Recommender Systems Accuracy in Social Networks Using Popularity , 2019, 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT).

[60]  Amir Mosavi,et al.  Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method. , 2019, The Science of the total environment.

[61]  Saroj Ratnoo,et al.  Feature selection using multi-objective CHC genetic algorithm , 2020 .

[62]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[63]  Yamuna Prasad,et al.  A recursive PSO scheme for gene selection in microarray data , 2018, Appl. Soft Comput..

[64]  T. Williamson,et al.  Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy , 2015, Comput. Medical Imaging Graph..

[65]  Shengxiang Yang,et al.  A Similarity-Based Cooperative Co-Evolutionary Algorithm for Dynamic Interval Multiobjective Optimization Problems , 2020, IEEE Transactions on Evolutionary Computation.

[66]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[67]  Ling Shao,et al.  Flexible unsupervised feature extraction for image classification , 2019, Neural Networks.

[68]  Mahdi Vasighi,et al.  Community Detection in Complex Networks by Detecting and Expanding Core Nodes Through Extended Local Similarity of Nodes , 2018, IEEE Transactions on Computational Social Systems.

[69]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[70]  Asgarali Bouyer,et al.  LP-LPA: A link influence-based label propagation algorithm for discovering community structures in networks , 2017 .

[71]  Ali Zakerolhosseini,et al.  Unsupervised probabilistic feature selection using ant colony optimization , 2016, Expert Syst. Appl..

[72]  Ratna Babu Chinnam,et al.  mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..

[73]  Daoliang Li,et al.  An improved genetic algorithm for optimal feature subset selection from multi-character feature set , 2011, Expert Syst. Appl..

[74]  J.G.R. Sathiaseelan,et al.  Feature Selection Using K-Means Genetic Algorithm for Multi-objective Optimization , 2015 .

[75]  Fatiha Mrabti,et al.  Feature selection methods and genomic big data: a systematic review , 2019, Journal of Big Data.

[76]  Parham Moradi,et al.  Integration of graph clustering with ant colony optimization for feature selection , 2015, Knowl. Based Syst..

[77]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[78]  Yuefeng Li,et al.  A new attributed graph clustering by using label propagation in complex networks , 2020, J. King Saud Univ. Comput. Inf. Sci..

[79]  Guiping Hu,et al.  A two-layer feature selection method using Genetic Algorithm and Elastic Net , 2021, Expert Syst. Appl..

[80]  Michel Toulouse,et al.  A multilevel tabu search algorithm for the feature selection problem in biomedical data , 2008, Comput. Math. Appl..

[81]  Dunwei Gong,et al.  Binary differential evolution with self-learning for multi-objective feature selection , 2020, Inf. Sci..

[82]  Seyed Mohammad Mirjalili,et al.  Improved Salp Swarm Algorithm based on opposition based learning and novel local search algorithm for feature selection , 2020, Expert Syst. Appl..

[83]  Osman Y. Özaltın,et al.  Feature selection for classification models via bilevel optimization , 2018, Comput. Oper. Res..

[84]  Alex X. Liu,et al.  Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers , 2020, Appl. Soft Comput..