Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms

In classification, feature selection is an important data pre-processing technique, but it is a difficult problem due mainly to the large search space. Particle swarm optimisation (PSO) is an efficient evolutionary computation technique. However, the traditional personal best and global best updating mechanism in PSO limits its performance for feature selection and the potential of PSO for feature selection has not been fully investigated. This paper proposes three new initialisation strategies and three new personal best and global best updating mechanisms in PSO to develop novel feature selection approaches with the goals of maximising the classification performance, minimising the number of features and reducing the computational time. The proposed initialisation strategies and updating mechanisms are compared with the traditional initialisation and the traditional updating mechanism. Meanwhile, the most promising initialisation strategy and updating mechanism are combined to form a new approach (PSO(4-2)) to address feature selection problems and it is compared with two traditional feature selection methods and two PSO based methods. Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features. PSO(4-2) outperforms the two traditional methods and two PSO based algorithm in terms of the computational time, the number of features and the classification performance. The superior performance of this algorithm is due mainly to both the proposed initialisation strategy, which aims to take the advantages of both the forward selection and backward selection to decrease the number of features and the computational time, and the new updating mechanism, which can overcome the limitations of traditional updating mechanisms by taking the number of features into account, which reduces the number of features and the computational time.

[1]  Georgios Dounias,et al.  Particle swarm optimization for pap-smear diagnosis , 2008, Expert Syst. Appl..

[2]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[3]  Abdul Rauf Baig,et al.  Opposition based initialization in particle swarm optimization (O-PSO) , 2009, GECCO '09.

[4]  Sean Luke,et al.  Lexicographic Parsimony Pressure , 2002, GECCO.

[5]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[6]  Mengjie Zhang,et al.  New fitness functions in binary particle swarm optimisation for feature selection , 2012, 2012 IEEE Congress on Evolutionary Computation.

[7]  Oscal T.-C. Chen,et al.  Particle Swarm Optimization incorporating a Preferential Velocity-Updating Mechanism and Its Applications in IIR Filter Design , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[8]  A. L. Gutierrez,et al.  Comparison of different PSO initialization techniques for high dimensional search space problems: A test with FSS and antenna arrays , 2011, Proceedings of the 5th European Conference on Antennas and Propagation (EUCAP).

[9]  Yongsheng Ding,et al.  An Improved Particle Swarm Optimization with an Adaptive Updating Mechanism , 2011, ICSI.

[10]  Sébastien Paris,et al.  Application of global optimization methods to model and feature selection , 2012, Pattern Recognit..

[11]  Andreas König,et al.  Feature-Level Fusion by Multi-Objective Binary Particle Swarm Based Unbiased Feature Selection for Optimized Sensor System Design , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[12]  Adel M. Alimi,et al.  Distributed MOPSO with a new population subdivision technique for the feature selection , 2011, 2011 5th International Symposium on Computational Intelligence and Intelligent Informatics (ISCIII).

[13]  M. A. Khanesar,et al.  A novel binary particle swarm optimization , 2007, 2007 Mediterranean Conference on Control & Automation.

[14]  Amiya Kumar Rath,et al.  Rough ACO: A Hybridized Model for Feature Selection in Gene Expression Data , 2010 .

[15]  Eibe Frank,et al.  Large-scale attribute selection using wrappers , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[16]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Mark Richards,et al.  Choosing a starting configuration for particle swarm optimization , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[18]  Shih-Wei Lin,et al.  PSOLDA: A particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis , 2009, Appl. Soft Comput..

[19]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[20]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[21]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[22]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[23]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[24]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[25]  Konstantinos E. Parsopoulos,et al.  Initializing the Particle Swarm Optimizer Using the Nonlinear Simplex Method , 2002 .

[26]  Xiangyang Wang,et al.  Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma , 2006, Comput. Methods Programs Biomed..

[27]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[28]  Yahya Slimani,et al.  Adaptive Particle Swarm Optimizer for Feature Selection , 2010, IDEAL.

[29]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[30]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[31]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Jing Wang,et al.  A New Population Initialization Method Based on Space Transformation Search , 2009, 2009 Fifth International Conference on Natural Computation.

[33]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[34]  Mengjie Zhang,et al.  Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification , 2013, EvoApplications.

[35]  Mengjie Zhang,et al.  Using genetic programming for context-sensitive feature scoring in classification problems , 2011, Connect. Sci..

[36]  Hui Wang,et al.  Opposition-based particle swarm algorithm with cauchy mutation , 2007, 2007 IEEE Congress on Evolutionary Computation.

[37]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[38]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[39]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[40]  Andries P. Engelbrecht,et al.  Computational Intelligence: An Introduction , 2002 .

[41]  Mengjie Zhang,et al.  Improving Relevance Measures Using Genetic Programming , 2012, EuroGP.

[42]  Mengjie Zhang,et al.  Multi-objective particle swarm optimisation (PSO) for feature selection , 2012, GECCO '12.

[43]  Li-Yeh Chuang,et al.  Improved binary particle swarm optimization using catfish effect for feature selection , 2011, Expert Syst. Appl..

[44]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[45]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[46]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[47]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[48]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[49]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[50]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[51]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[52]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[53]  Mengjie Zhang,et al.  Pareto front feature selection: using genetic programming to explore feature space , 2009, GECCO.