Multi-Objective Feature Selection With Missing Data in Classification

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

[1]  Azlan Mohd Zain,et al.  A Review On Missing Value Estimation Using Imputation Algorithm , 2017 .

[2]  Ye Tian,et al.  A Knee Point-Driven Evolutionary Algorithm for Many-Objective Optimization , 2015, IEEE Transactions on Evolutionary Computation.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Kay Chen Tan,et al.  Solving Large-Scale Multiobjective Optimization Problems With Sparse Optimal Solutions via Unsupervised Neural Networks , 2020, IEEE Transactions on Cybernetics.

[5]  Ferrante Neri,et al.  On Algorithmic Descriptions and Software Implementations for Multi-objective Optimisation: A Comparative Study , 2020, SN Computer Science.

[6]  Mengjie Zhang,et al.  New mechanism for archive maintenance in PSO-based multi-objective feature selection , 2016, Soft Comput..

[7]  Dun-Wei Gong,et al.  Feature selection algorithm based on bare bones particle swarm optimization , 2015, Neurocomputing.

[8]  Sergey Subbotin Quasi-Relief Method of Informative Features Selection for Classification , 2018, 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT).

[9]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[10]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[11]  John E. Dennis,et al.  Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems , 1998, SIAM J. Optim..

[12]  Xiaoyan Sun,et al.  Multi-objective feature selection based on artificial bee colony: An acceleration approach with variable sample size , 2020, Appl. Soft Comput..

[13]  Hisao Ishibuchi,et al.  Multiple Reference Points-Based Decomposition for Multiobjective Feature Selection in Classification: Static and Dynamic Mechanisms , 2020, IEEE Transactions on Evolutionary Computation.

[14]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[15]  Bernhard Sendhoff,et al.  A Reference Vector Guided Evolutionary Algorithm for Many-Objective Optimization , 2016, IEEE Transactions on Evolutionary Computation.

[16]  Ye Tian,et al.  PlatEMO: A MATLAB Platform for Evolutionary Multi-Objective Optimization [Educational Forum] , 2017, IEEE Computational Intelligence Magazine.

[17]  Alex X. Liu,et al.  Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers , 2020, Appl. Soft Comput..

[18]  Mengjie Zhang,et al.  Self-Adaptive Particle Swarm Optimization for Large-Scale Feature Selection in Classification , 2019, ACM Trans. Knowl. Discov. Data.

[19]  Zhang Yi,et al.  IGD Indicator-Based Evolutionary Algorithm for Many-Objective Optimization Problems , 2018, IEEE Transactions on Evolutionary Computation.

[20]  Mehran Amiri,et al.  Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[21]  M. Dash,et al.  Feature selection via set cover , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[22]  Xin Yao,et al.  Two-Archive Evolutionary Algorithm for Constrained Multiobjective Optimization , 2017, IEEE Transactions on Evolutionary Computation.

[23]  Fazhi He,et al.  IBEA-SVM: An Indicator-based Evolutionary Algorithm Based on Pre-selection with Classification Guided by SVM , 2019, Applied Mathematics-A Journal of Chinese Universities.

[24]  Zhongheng Zhang,et al.  Missing data imputation: focusing on single imputation. , 2016, Annals of translational medicine.

[25]  Ferrante Neri,et al.  A fast hypervolume driven selection mechanism for many-objective optimisation problems , 2017, Swarm Evol. Comput..

[26]  Yong Zhang,et al.  Multiobjective Particle Swarm Optimization for Feature Selection With Fuzzy Cost , 2020, IEEE Transactions on Cybernetics.

[27]  Michael G. Epitropakis,et al.  Progressive preference articulation for decision making in multi-objective optimisation problems , 2017, Integr. Comput. Aided Eng..

[28]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[29]  Dapeng Wu,et al.  Network-Based Heterogeneous Particle Swarm Optimization and Its Application in UAV Communication Coverage , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.

[30]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[31]  Mark Huisman,et al.  Missing Network Data A Comparison of Different Imputation Methods , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[32]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[33]  R. Perera Research methods journal club: a gentle introduction to imputation of missing values , 2008, Evidence-based medicine.

[34]  Tapabrata Ray,et al.  A Multiple Surrogate Assisted Decomposition-Based Evolutionary Algorithm for Expensive Multi/Many-Objective Optimization , 2019, IEEE Transactions on Evolutionary Computation.

[35]  Yaochu Jin,et al.  A competitive mechanism based multi-objective particle swarm optimizer with fast convergence , 2018, Inf. Sci..

[36]  Jian Cheng,et al.  Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[38]  Qingfu Zhang,et al.  Variable-Length Pareto Optimization via Decomposition-Based Evolutionary Multiobjective Algorithm , 2019, IEEE Transactions on Evolutionary Computation.

[39]  Yuhui Shi,et al.  Optimal Satellite Formation Reconfiguration Based on Closed-Loop Brain Storm Optimization , 2013, IEEE Computational Intelligence Magazine.

[40]  Maoguo Gong,et al.  A Clustering-Based Evolutionary Algorithm for Many-Objective Optimization Problems , 2019, IEEE Transactions on Evolutionary Computation.

[41]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[42]  Mengjie Zhang,et al.  Variable-Length Particle Swarm Optimization for Feature Selection on High-Dimensional Classification , 2019, IEEE Transactions on Evolutionary Computation.

[43]  Ruifeng Shi,et al.  Multi-Objective Optimization of Electric Vehicle Fast Charging Stations with SPEA-II , 2015 .

[44]  Seyed Mohammad Mirjalili,et al.  Whale optimization approaches for wrapper feature selection , 2018, Appl. Soft Comput..

[45]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[46]  A. Plaia,et al.  Single imputation method of missing values in environmental pollution data sets , 2006 .

[47]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[48]  Ye Tian,et al.  An Evolutionary Algorithm for Large-Scale Sparse Multiobjective Optimization Problems , 2020, IEEE Transactions on Evolutionary Computation.

[49]  Mengjie Zhang,et al.  Pareto front feature selection based on artificial bee colony optimization , 2018, Inf. Sci..