Adaptive multi-subswarm optimisation for feature selection on high-dimensional classification

Feature space is an important factor influencing the performance of any machine learning algorithm including classification methods. Feature selection aims to remove irrelevant and redundant features that may negatively affect the learning process especially on high-dimensional data, which usually suffers from the curse of dimensionality. Feature ranking is one of the most scalable feature selection approaches to high-dimensional problems, but most of them fail to automatically determine the number of selected features as well as detect redundancy between features. Particle swarm optimisation (PSO) is a population-based algorithm which has shown to be effective in addressing these limitations. However, its performance on high-dimensional data is still limited due to the large search space and high computation cost. This study proposes the first adaptive multi-swarm optimisation (AMSO) method for feature selection that can automatically select a feature subset of high-dimensional data more effectively and efficiently than the compared methods. The subswarms are automatically and dynamically changed based on their performance during the evolutionary process. Experiments on ten high-dimensional datasets of varying difficulties have shown that AMSO is more effective and more efficient than the compared PSO-based and traditional feature selection methods in most cases.

[1]  S. C. Neoh,et al.  A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition , 2017, IEEE Transactions on Cybernetics.

[2]  Parham Moradi,et al.  A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy , 2016, Appl. Soft Comput..

[3]  Haider Banka,et al.  A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation , 2015, Pattern Recognit. Lett..

[4]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[5]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[6]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[7]  Mohd Saberi Mohamad,et al.  A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes From Gene Expression Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[8]  Mário A. T. Figueiredo,et al.  Efficient feature selection filters for high-dimensional data , 2012, Pattern Recognit. Lett..

[9]  Ausama Al-Sahaf,et al.  Automatically Evolving Rotation-Invariant Texture Image Descriptors by Genetic Programming , 2017, IEEE Transactions on Evolutionary Computation.

[10]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[12]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Yaochu Jin,et al.  Feature selection for high-dimensional classification using a competitive swarm optimizer , 2016, Soft Computing.

[15]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[16]  Eibe Frank,et al.  Large-scale attribute selection using wrappers , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[17]  Li-Yeh Chuang,et al.  Improved binary particle swarm optimization using catfish effect for feature selection , 2011, Expert Syst. Appl..

[18]  Yaochu Jin,et al.  A Competitive Swarm Optimizer for Large Scale Optimization , 2015, IEEE Transactions on Cybernetics.

[19]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[20]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[21]  Mengjie Zhang,et al.  A PSO based hybrid feature selection algorithm for high-dimensional classification , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[22]  Li-Yeh Chuang,et al.  Boolean binary particle swarm optimization for feature selection , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).