A many-objective feature selection for multi-label classification

Abstract Feature selection is an important task in machine learning. As multi-label classification tasks appear in various fields, researchers have investigated multi-label feature selection algorithms to reduce data dimensions. Most of the existing wrapper multi-label feature selection algorithms use multi-objective method to obtain the selected features. However, there are multiple criteria to measure the quality of multi-label classification results. In view of this, this study presents a many-objective optimization based multi-label feature selection algorithm (MMFS). To improve the diversity and convergence of NSGA III, we propose an improved NSGA III algorithm with two archives. In this algorithm, new crossover and mutation operators for feature selection are designed to improve the exploration capability, and the influence of the selection threshold θ on feature scale and multi-label classification performance in real number coding is studied. Finally, we conduct experiments on 11 multi-label datasets. The experiments demonstrate that MMFS can balance multiple objectives, remove irrelevant and redundant features, and obtain satisfactory classification results.

[1]  Dae-Won Kim,et al.  Memetic feature selection algorithm for multi-label classification , 2015, Inf. Sci..

[2]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[3]  Zhiming Luo,et al.  Manifold regularized discriminative feature selection for multi-label learning , 2019, Pattern Recognit..

[4]  Jie Lu,et al.  A Bayesian nonparametric model for multi-label learning , 2017, Machine Learning.

[5]  Zheng Rong Yang,et al.  Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR , 2004, J. Chem. Inf. Model..

[6]  Bassam Al-Salemi,et al.  RFBoost: An improved multi-label boosting algorithm and its application to text categorisation , 2016, Knowl. Based Syst..

[7]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  A. R. Baig,et al.  Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm , 2015 .

[10]  Sylvio Barbon Junior,et al.  Applying multi-label techniques in emotion identification of short texts , 2018, Neurocomputing.

[11]  Rui Huang,et al.  Manifold-based constraint Laplacian score for multi-label feature selection , 2018, Pattern Recognit. Lett..

[12]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[13]  V. S. Shankar Sriram,et al.  An efficient intrusion detection system based on hypergraph - Genetic algorithm for parameter optimization and feature selection in support vector machine , 2017, Knowl. Based Syst..

[14]  Shahryar Rahnamayan,et al.  An evolutionary decomposition-based multi-objective feature selection for multi-label classification , 2020, PeerJ Comput. Sci..

[15]  Tao Li,et al.  Recent advances in feature selection and its applications , 2017, Knowledge and Information Systems.

[16]  Xin Yao,et al.  Two_Arch2: An Improved Two-Archive Algorithm for Many-Objective Optimization , 2015, IEEE Transactions on Evolutionary Computation.

[17]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[18]  Dae-Won Kim,et al.  Optimization approach for feature selection in multi-label classification , 2017, Pattern Recognit. Lett..

[19]  Tao Li,et al.  A novel hybrid genetic algorithm with granular information for feature selection and optimization , 2018, Appl. Soft Comput..

[20]  Swagatam Das,et al.  Feature weighting and selection with a Pareto-optimal trade-off between relevancy and redundancy , 2017, Pattern Recognit. Lett..

[21]  Yong Zhang,et al.  A PSO-based multi-objective multi-label feature selection method in classification , 2017, Scientific Reports.

[22]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[23]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[24]  Dae-Won Kim,et al.  Memetic feature selection for multilabel text categorization using label frequency difference , 2019, Inf. Sci..

[25]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[26]  Ratna Babu Chinnam,et al.  mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..

[27]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[28]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[29]  Mengjie Zhang,et al.  Pareto front feature selection based on artificial bee colony optimization , 2018, Inf. Sci..

[30]  Mengjie Zhang,et al.  A multi-objective particle swarm optimisation for filter-based feature selection in classification problems , 2012, Connect. Sci..

[31]  Tao Li,et al.  Binary Differential Evolution Based on Individual Entropy for Feature Subset Optimization , 2019, IEEE Access.

[32]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[33]  Qinghua Hu,et al.  Multi-label feature selection based on max-dependency and min-redundancy , 2015, Neurocomputing.

[34]  Hossein Nezamabadi-pour,et al.  A label-specific multi-label feature selection algorithm based on the Pareto dominance concept , 2019, Pattern Recognit..