An improved feature selection algorithm based on graph clustering and ant colony optimization

Abstract Dimensionality reduction is an important preprocessing step to improve the performance of machine learning algorithms. Feature selection methods can efficiently speed up the learning process and improve the overall classification accuracy by reducing the computational complexity. Among the feature selection methods, multivariate methods are more effective in removing irrelevant and redundant features. An efficient multivariate feature selection method, optimization method, called ‘graph clustering based ant colony optimization (GCACO)’ has been recently introduced and shown to outperform other well-known feature selection methods. In the GCACO, features are divided into communities (clusters) in the entire feature space represented as a graph by an efficient community detection algorithm. An ACO-based search strategy is then used to select an optimal feature subset from the initial set of features. In this paper, a modified GCACO algorithm called MGCACO is presented to significantly improve the performance of the GCACO. Performance of the MGCACO algorithm was assessed by testing it on several standard benchmark datasets and sleep EEG data. The performance of the MGCACO was compared to those obtained using the original GCACO and other well-known filtering methods available in the literature. The MGCACO achieved superior performance over the GCACO and other univariate and multivariate algorithms with up to 10%. The MGCACO also exhibited higher efficiency in reducing the number of features all by keeping the classification accuracy maximum.

[1]  Parham Moradi,et al.  An unsupervised feature selection algorithm based on ant colony optimization , 2014, Eng. Appl. Artif. Intell..

[2]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[4]  A. Aarabi,et al.  A multistage knowledge-based system for EEG seizure detection in newborn infants , 2007, Clinical Neurophysiology.

[5]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[6]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[7]  R. Grebe,et al.  Feature Selection Based on Discriminant and Redundancy Analysis Applied to Seizure Detection in Newborn , 2005, Conference Proceedings. 2nd International IEEE EMBS Conference on Neural Engineering, 2005..

[8]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[9]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[10]  Yixin Chen,et al.  Efficient ant colony optimization for image feature selection , 2013, Signal Process..

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  M. Lambert,et al.  Benchmark Tests of Evolutionary Algorithms: Mathematic Evaluation and Application to Water Distribution Systems , 2006 .

[13]  J.G.R. Sathiaseelan,et al.  Feature Selection Using K-Means Genetic Algorithm for Multi-objective Optimization , 2015 .

[14]  Parham Moradi,et al.  Integration of graph clustering with ant colony optimization for feature selection , 2015, Knowl. Based Syst..

[15]  Gennady Agre,et al.  A Weighted Feature Selection Method for Instance-Based Classification , 2016, AIMSA.

[16]  A. Hassan,et al.  A decision support system for automatic sleep staging from EEG signals using tunable Q-factor wavelet transform and spectral features , 2016, Journal of Neuroscience Methods.

[17]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[18]  Jos'e R. Berrendero,et al.  The mRMR variable selection method: a comparative study for functional data , 2015, 1507.03496.

[19]  R. Grebe,et al.  Automated neonatal seizure detection: A multistage classification system through feature selection based on relevance and redundancy analysis , 2006, Clinical Neurophysiology.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Zhang Yong,et al.  Feature selection of unreliable data using an improved multi-objective PSO algorithm , 2016 .

[22]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Kazuyuki Murase,et al.  A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.

[25]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[26]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[27]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[28]  Mohammed Imamul Hassan Bhuiyan,et al.  Automated identification of sleep states from EEG signals by means of ensemble empirical mode decomposition and random under sampling boosting , 2017, Comput. Methods Programs Biomed..

[29]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.