Evolutionary Computation for Feature Selection in Classification

Classification aims to identify a class label of an instance according to the information from its characteristics or features. Unfortunately, many classification problems have a large feature set containing irrelevant and redundant features, which reduce the classification performance. In order to address the above problem, feature selection is proposed to select a small subset of relevant features. There are three main types of feature selection methods, i.e. wrapper, embedded and filter approaches. Wrappers use a classification algorithm to evaluate candidate feature subsets. In embedded approaches, the selection process is embedded in the training process of a classification algorithm. Different from the other two approaches, filters do not involve any classification algorithm during the selection process. Feature selection is an important process but it is not an easy task due to its large search space and complex feature interactions. Because of the potential global search ability, Evolutionary Computation (EC), especially Particle Swarm Optimization (PSO), has been widely and successfully applied to feature selection. However, there is potential to improve the effectiveness and efficiency of EC-based feature selection. The overall goal of this thesis is to investigate and improve the capability of EC for feature selection to select small feature subsets while maintaining or even improving the classification performance compared to using all features. Different aspects of feature selection are considered in this thesis such as the number of objectives (single-objective/multi-objective), the fitness function (filter/wrapper), and the searching mechanism. This thesis introduces a new fitness function based on mutual information which is calculated by an estimation approach instead of the traditional counting approach. Results show that the estimation approach works well on both continuous and discrete data. More importantly, mutual information calculated by the estimation approach can capture feature interactions better than the traditional counting approach. This thesis develops a novel binary PSO algorithm, which is the first work to redefine some core concepts of PSO such as velocity and momentum to suit the characteristics of binary search spaces. Experimental results show that the proposed binary PSO algorithm evolve better solutions than other binary EC algorithms when the search spaces are large and complex. Specifically, on feature selection, the proposed binary PSO algorithm can select smaller feature subsets with similar or better classification accuracies, especially when there are a large number of features. This thesis proposes surrogate models for wrapper-based feature selection. The surrogate models use surrogate training sets which are subsets of informative instances selected from the training set. Experimental results show that the proposed surrogate models assist PSO to reduce the computational cost while maintaining or even improving the classification performance compared to using only the original training set. The thesis develops the first wrapper-based multi-objective feature selection algorithm using MOEA/D. A new decomposition strategy using multiple reference points for MOEA/D is designed, which can deal with different characteristics of multi-objective feature selection such as highly discontinuous Pareto fronts and complex relationships between objectives. The experimental results show that the proposed algorithm can evolve more diverse non-dominated sets than other multi-objective algorithms. This thesis introduces the first PSO-based feature selection algorithm for transfer learning. In the proposed algorithm, the fitness function uses classification performance to reduce the differences between domains while maintaining the discriminative ability on the target domain. The experimental results show that the proposed algorithm can select feature subsets which achieve better classification performance than four state-of-the-art feature-based transfer learning algorithms. List of Publications • Hoai Bach Nguyen, Bing Xue, and Peter Andreae. “PSO with Surrogate Models for Feature Selection: Static and Dynamic Clusteringbased Methods”, Memetic Computing, 07 March 2018 (Online). https://doi.org/10.1007/s12293-018-0254-9 • Hoai Bach Nguyen, Bing Xue, Peter Andreae. “Mutual Information for Feature Selection: Estimation or Counting?”, Evolutionary Intelligence, vol. 9, no. 3, pp. 95-110, 2016. • Hoai Bach Nguyen, Bing Xue, Peter Andreae and Mengjie Zhang. “A New Binary Particle Swarm Optimization Approach: Momentum and Dynamic Balance Between Exploration and Exploitation”. Submitted to IEEE Transactions on Cybernetics (under revise and resubmit). • Hoai Bach Nguyen, Bing Xue, Hisao Ishibuchi, Peter Andreae, and Mengjie Zhang. “Multiple Reference Points based Decomposition for Multi-objective Feature Selection in Classification: Static and Dynamic Mechanisms”. Submitted to IEEE Transactions on Evolutionary Computation (under revise and resubmit). • Hoai Bach Nguyen, Bing Xue, and Peter Andreae. “A Particle Swarm Optimization based Feature Selection Approach to Transfer Learning in Classification”. Proceedings of 2018 Genetic and Evolutionary

[1]  Deron Liang,et al.  Novel feature selection methods to financial distress prediction , 2014, Expert Syst. Appl..

[2]  Alex Alves Freitas,et al.  A Novel Genetic Algorithm for Feature Selection in Hierarchical Feature Spaces , 2018, SDM.

[3]  Andrzej Jaszkiewicz,et al.  On the computational efficiency of multiple objective metaheuristics. The knapsack problem case study , 2004, Eur. J. Oper. Res..

[4]  Jae-Hyun Seo,et al.  Feature Selection for Very Short-Term Heavy Rainfall Prediction Using Evolutionary Computation , 2014 .

[5]  Antonio J. Nebro,et al.  Redesigning the jMetal Multi-Objective Optimization Framework , 2015, GECCO.

[6]  U. Maulik,et al.  An SVM-Wrapped Multiobjective Evolutionary Feature Selection Approach for Identifying Cancer-MicroRNA Markers , 2013, IEEE Transactions on NanoBioscience.

[7]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[8]  Deron Liang,et al.  The effect of feature selection on financial distress prediction , 2015, Knowl. Based Syst..

[9]  Mengjie Zhang,et al.  Multiple reference points MOEA/D for feature selection , 2017, GECCO.

[10]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[11]  Yiu-Ming Cheung,et al.  Self-Organizing Map-Based Weight Design for Decomposition-Based Many-Objective Evolutionary Algorithm , 2018, IEEE Transactions on Evolutionary Computation.

[12]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[13]  Cheng Pan,et al.  Task Allocation for Wireless Sensor Network Using Modified Binary Particle Swarm Optimization , 2014, IEEE Sensors Journal.

[14]  Frans van den Bergh,et al.  An analysis of particle swarm optimizers , 2002 .

[15]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Bing Xue,et al.  A Hybrid GA-GP Method for Feature Reduction in Classification , 2017, SEAL.

[17]  Carlos A. Coello Coello,et al.  Evolutionary multiobjective optimization , 2011, WIREs Data Mining Knowl. Discov..

[18]  Chee Peng Lim,et al.  A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models , 2014, Neurocomputing.

[19]  Li-Yeh Chuang,et al.  Boolean binary particle swarm optimization for feature selection , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[20]  Abdullah Al-Dujaili,et al.  DE vs. PSO: A Performance Assessment for Expensive Problems , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[21]  Yuan Shi,et al.  Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation , 2012, ICML.

[22]  Mengjie Zhang,et al.  Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression , 2017, IEEE Transactions on Evolutionary Computation.

[23]  Qingfu Zhang,et al.  MOEA/D with NBI-style Tchebycheff approach for portfolio management , 2010, IEEE Congress on Evolutionary Computation.

[24]  Mengjie Zhang,et al.  Binary particle swarm optimisation for feature selection: A filter based approach , 2012, 2012 IEEE Congress on Evolutionary Computation.

[25]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[26]  Mehmet Fatih Tasgetiren,et al.  A differential evolution algorithm with variable neighborhood search for multidimensional knapsack problem , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[27]  Bernhard Sendhoff,et al.  A Reference Vector Guided Evolutionary Algorithm for Many-Objective Optimization , 2016, IEEE Transactions on Evolutionary Computation.

[28]  Mengjie Zhang,et al.  Binary particle swarm optimisation and rough set theory for dimension reduction in classification , 2013, 2013 IEEE Congress on Evolutionary Computation.

[29]  Mengjie Zhang,et al.  Multi-objective particle swarm optimisation (PSO) for feature selection , 2012, GECCO '12.

[30]  M Reyes Sierra,et al.  Multi-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art , 2006 .

[31]  Adel Al-Jumaily,et al.  A Combined Ant Colony and Differential Evolution Feature Selection Algorithm , 2008, ANTS Conference.

[32]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .

[33]  Francisco Herrera,et al.  A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection , 2009, HAIS.

[34]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[35]  Witold Pedrycz,et al.  Modified binary particle swarm optimization , 2008 .

[36]  Shengxiang Yang,et al.  A Strength Pareto Evolutionary Algorithm Based on Reference Direction for Multiobjective and Many-Objective Optimization , 2017, IEEE Transactions on Evolutionary Computation.

[37]  Ling Wang,et al.  A hybrid genetic algorithm-neural network strategy for simulation optimization , 2005, Appl. Math. Comput..

[38]  Zhong Ming,et al.  An improved NSGA-III algorithm for feature selection used in intrusion detection , 2017, Knowl. Based Syst..

[39]  Qingfu Zhang,et al.  Decomposition of a Multiobjective Optimization Problem Into a Number of Simple Multiobjective Subproblems , 2014, IEEE Transactions on Evolutionary Computation.

[40]  Yaochu Jin,et al.  A Competitive Swarm Optimizer for Large Scale Optimization , 2015, IEEE Transactions on Cybernetics.

[41]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[42]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[43]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[44]  Mengjie Zhang,et al.  Gaussian Transformation Based Representation in Particle Swarm Optimisation for Feature Selection , 2015, EvoApplications.

[45]  Luis Fernando de Mingo López,et al.  Multidimensional knapsack problem optimization using a binary particle swarm model with genetic operations , 2018, Soft Comput..

[46]  Yan Li,et al.  Estimation of Mutual Information: A Survey , 2009, RSKT.

[47]  Qingfu Zhang,et al.  Are All the Subproblems Equally Important? Resource Allocation in Decomposition-Based Multiobjective Evolutionary Algorithms , 2016, IEEE Transactions on Evolutionary Computation.

[48]  Mengjie Zhang,et al.  Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification , 2013, EvoApplications.

[49]  Utkarsh Singh,et al.  Optimal Feature Selection via NSGA-II for Power Quality Disturbances Classification , 2018, IEEE Transactions on Industrial Informatics.

[50]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[51]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[52]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[53]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[54]  Yaochu Jin,et al.  Feature selection for high-dimensional classification using a competitive swarm optimizer , 2016, Soft Computing.

[55]  Qingfu Zhang,et al.  The performance of a new version of MOEA/D on CEC09 unconstrained MOP test instances , 2009, 2009 IEEE Congress on Evolutionary Computation.

[56]  Joseph T. Lizier,et al.  JIDT: An Information-Theoretic Toolkit for Studying the Dynamics of Complex Systems , 2014, Front. Robot. AI.

[57]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[58]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[59]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[60]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[61]  Mengjie Zhang,et al.  A Dimension Reduction Approach to Classification Based on Particle Swarm Optimisation and Rough Set Theory , 2012, Australasian Conference on Artificial Intelligence.

[62]  Mengjie Zhang,et al.  New fitness functions in binary particle swarm optimisation for feature selection , 2012, 2012 IEEE Congress on Evolutionary Computation.

[63]  Olaf Sporns,et al.  Methods for quantifying the informational structure of sensory and motor data , 2007, Neuroinformatics.

[64]  Anne Auger,et al.  Theory of the hypervolume indicator: optimal μ-distributions and the choice of the reference point , 2009, FOGA '09.

[65]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[66]  C. Coello,et al.  Improving PSO-based Multi-Objective Optimization using Crowding , Mutation and �-Dominance , 2005 .

[67]  Urvesh Bhowan,et al.  Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson , 2015, EuroGP.

[68]  Qingfu Zhang,et al.  An Evolutionary Many-Objective Optimization Algorithm Based on Dominance and Decomposition , 2015, IEEE Transactions on Evolutionary Computation.

[69]  M. Tahar Kechadi,et al.  Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications , 2010, Expert Syst. Appl..

[70]  Mengjie Zhang,et al.  Improved PSO for Feature Selection on High-Dimensional Datasets , 2014, SEAL.

[71]  Xiaodong Li,et al.  An Analysis of the Inertia Weight Parameter for Binary Particle Swarm Optimization , 2016, IEEE Transactions on Evolutionary Computation.

[72]  Mengjie Zhang,et al.  Gaussian Based Particle Swarm Optimisation and Statistical Clustering for Feature Selection , 2014, EvoCOP.

[73]  Andries Petrus Engelbrecht,et al.  A study of particle swarm optimization particle trajectories , 2006, Inf. Sci..

[74]  Dipti Srinivasan,et al.  A Survey of Multiobjective Evolutionary Algorithms Based on Decomposition , 2017, IEEE Transactions on Evolutionary Computation.

[75]  Hisao Ishibuchi,et al.  Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling , 2003, IEEE Trans. Evol. Comput..

[76]  Jiye Liang,et al.  An efficient feature selection algorithm for hybrid data , 2016, Neurocomputing.

[77]  Hisao Ishibuchi,et al.  Adaptation of Scalarizing Functions in MOEA/D: An Adaptive Scalarizing Function-Based Multiobjective Evolutionary Algorithm , 2009, EMO.

[78]  Peter J. Fleming,et al.  Towards Understanding the Cost of Adaptation in Decomposition-Based Optimization Algorithms , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[79]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[80]  Fang Liu,et al.  MOEA/D with Adaptive Weight Adjustment , 2014, Evolutionary Computation.

[81]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[82]  Mengjie Zhang,et al.  Binary PSO for Web Service Location-Allocation , 2017, ACALCI.

[83]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Fred Glover,et al.  Critical Event Tabu Search for Multidimensional Knapsack Problems , 1996 .

[85]  Xiaodong Li,et al.  Swarm Intelligence in Optimization , 2008, Swarm Intelligence.

[86]  Mengjie Zhang,et al.  PSO and Statistical Clustering for Feature Selection: A New Representation , 2014, SEAL.

[87]  Yongming Li,et al.  Research of multi-population agent genetic algorithm for feature selection , 2009, Expert Syst. Appl..

[88]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[89]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[90]  Hisao Ishibuchi,et al.  Performance of Decomposition-Based Many-Objective Algorithms Strongly Depends on Pareto Front Shapes , 2017, IEEE Transactions on Evolutionary Computation.

[91]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[92]  Mengjie Zhang,et al.  A Genetic Programming Approach to Hyper-Heuristic Feature Selection , 2012, SEAL.

[93]  Zhenfeng He,et al.  Instance selection for time series classification based on immune binary particle swarm optimization , 2013, Knowledge-Based Systems.

[94]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[95]  Yaochu Jin,et al.  Surrogate-assisted evolutionary computation: Recent advances and future challenges , 2011, Swarm Evol. Comput..

[96]  Deng Libao,et al.  A Hybrid Mutation Scheme-Based Discrete Differential Evolution Algorithm for Multidimensional Knapsack Problem , 2016, 2016 Sixth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC).

[97]  Michael D. Todd,et al.  Automated Feature Design for Numeric Sequence Classification by Genetic Programming , 2015, IEEE Transactions on Evolutionary Computation.

[98]  Qingfu Zhang,et al.  Stable Matching-Based Selection in Evolutionary Multiobjective Optimization , 2014, IEEE Transactions on Evolutionary Computation.

[99]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[100]  Alexandre C. B. Delbem,et al.  On the effectiveness of genetic algorithms for the multidimensional knapsack problem , 2014, GECCO.

[101]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[102]  Mengjie Zhang,et al.  Filter based backward elimination in wrapper based PSO for feature selection in classification , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[103]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[104]  Maw-Sheng Chern,et al.  Particle swarm optimization with time-varying acceleration coefficients for the multidimensional knapsack problem , 2014 .

[105]  David Casasent,et al.  An improvement on floating search algorithms for feature subset selection , 2009, Pattern Recognit..

[106]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Yudong Zhang,et al.  A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications , 2015 .

[108]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[109]  Hua Xu,et al.  An improved NSGA-III procedure for evolutionary many-objective optimization , 2014, GECCO.

[110]  Su Ruan,et al.  Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier , 2017, Comput. Medical Imaging Graph..

[111]  Swagatam Das,et al.  Simultaneous feature selection and weighting - An evolutionary multi-objective optimization approach , 2015, Pattern Recognit. Lett..

[112]  A. Lobbrecht,et al.  Optimization of water level monitoring network in polder systems using information theory , 2010 .

[113]  Jun Zhang,et al.  Adaptive Particle Swarm Optimization , 2008, ANTS Conference.

[114]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[115]  João Miguel da Costa Sousa,et al.  Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients , 2013, Appl. Soft Comput..

[116]  Kay Chen Tan,et al.  A Hybrid Estimation of Distribution Algorithm with Decomposition for Solving the Multiobjective Multiple Traveling Salesman Problem , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[117]  Mengjie Zhang,et al.  A multi-objective particle swarm optimisation for filter-based feature selection in classification problems , 2012, Connect. Sci..

[118]  Carolina P. de Almeida,et al.  An experimental analysis of evolutionary heuristics for the biobjective traveling purchaser problem , 2012, Ann. Oper. Res..

[119]  Liang Gao,et al.  Adjust weight vectors in MOEA/D for bi-objective optimization problems with discontinuous Pareto fronts , 2017, Soft Computing.

[120]  Gang Qu,et al.  Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department , 2017, Expert Syst. Appl..

[121]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[122]  Mengjie Zhang,et al.  Particle Swarm Optimisation and Statistical Clustering for Feature Selection , 2013, Australasian Conference on Artificial Intelligence.

[123]  Mengjie Zhang,et al.  Multi-objective Feature Selection in Classification: A Differential Evolution Approach , 2014, SEAL.

[124]  Lothar Thiele,et al.  A Tutorial on the Performance Assessment of Stochastic Multiobjective Optimizers , 2006 .

[125]  Xin Yao,et al.  A New Dominance Relation-Based Evolutionary Algorithm for Many-Objective Optimization , 2016, IEEE Transactions on Evolutionary Computation.

[126]  Andrew Lewis,et al.  S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization , 2013, Swarm Evol. Comput..

[127]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[128]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..