Multi-objective feature selection using hybridization of a genetic algorithm and direct multisearch for key quality characteristic selection

Abstract A multi-objective feature selection approach for selecting key quality characteristics (KQCs) of unbalanced production data is proposed. We define KQC (feature) selection as a bi-objective problem of maximizing the quality characteristic (QC) subset importance and minimizing the QC subset size. Three candidate feature importance measures, the geometric mean (GM), F1 score and accuracy, are applied to construct three KQC selection models. To solve the models, a two-phase optimization method for selecting the candidate solutions (QC subsets) using a novel multi-objective optimization method (GADMS) and the final KQC set from the candidate solutions using the ideal point method (IPM) is proposed. GADMS is a hybrid method composed of a genetic algorithm (GA) and a local search strategy named direct multisearch (DMS). In GADMS, we combine binary encoding with real value encoding to utilize the advantages of GAs and DMS. The experimental results on four production datasets show that the proposed method with GM performs the best in handling the data imbalance problem and outperforms the benchmark methods. Moreover, GADMS obtains significantly better search performance than the benchmark multi-objective optimization methods, which include a modified nondominated sorting genetic algorithm II (NSGA-II), two multi-objective particle swarm optimization algorithms and an improved DMS method.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Michel J. Anzanello,et al.  Selecting the best variables for classifying production batches into two quality levels , 2009 .

[3]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[4]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[5]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[6]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[7]  Jean-Pierre Gauchi,et al.  Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data , 2001 .

[8]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[9]  Mengjie Zhang,et al.  Pareto front feature selection based on artificial bee colony optimization , 2018, Inf. Sci..

[10]  Fakhri Karray,et al.  Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate , 2011, Appl. Soft Comput..

[11]  M. Freimer,et al.  Some New Results on Compromise Solutions for Group Decision Problems , 1976 .

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[14]  Wei Yan,et al.  Key Process Variable Identification for Quality Classification Based on PLSR Model and Wrapper Feature Selection , 2013 .

[15]  Fei Liu,et al.  Control chart pattern recognition using an integrated model based on binary-tree support vector machine , 2015 .

[16]  Xindong Wu,et al.  Online streaming feature selection using adapted Neighborhood Rough Set , 2019, Inf. Sci..

[17]  Haider Banka,et al.  A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation , 2015, Pattern Recognit. Lett..

[18]  Mengjie Zhang,et al.  Variable-Length Particle Swarm Optimization for Feature Selection on High-Dimensional Classification , 2019, IEEE Transactions on Evolutionary Computation.

[19]  Luís N. Vicente,et al.  Direct Multisearch for Multiobjective Optimization , 2011, SIAM J. Optim..

[20]  Juan-Zi Li,et al.  A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure , 2015, Inf. Sci..

[21]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[22]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.

[23]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[24]  Yang Zhang,et al.  Bi-objective variable selection for key quality characteristics selection based on a modified NSGA-II and the ideal point method , 2016, Comput. Ind..

[25]  Jose Miguel Puerta,et al.  Speeding up incremental wrapper feature subset selection with Naive Bayes classifier , 2014, Knowl. Based Syst..

[26]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[27]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[28]  Dunwei Gong,et al.  Binary differential evolution with self-learning for multi-objective feature selection , 2020, Inf. Sci..

[29]  Behrouz Minaei-Bidgoli,et al.  Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism , 2018, Expert Syst. Appl..

[30]  W. Art Chaovalitwongse,et al.  Multicriteria variable selection for classification of production batches , 2012, Eur. J. Oper. Res..

[31]  Joaquín A. Pacheco,et al.  Bi-objective feature selection for discriminant analysis in two-class classification , 2013, Knowl. Based Syst..

[32]  Hisao Ishibuchi,et al.  Evolutionary many-objective optimization: A short review , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[33]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[35]  Dae-Ki Kang,et al.  Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction , 2015, Expert Syst. Appl..

[36]  A. L. Custódio,et al.  MultiGLODS: global and local multiobjective optimization using direct search , 2018, Journal of Global Optimization.

[37]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[38]  Xin Fan,et al.  Feature selection for imbalanced data based on neighborhood rough sets , 2019, Inf. Sci..

[39]  Qing Zhang,et al.  A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method , 2018, Neurocomputing.

[40]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[41]  Nikhil R. Pal,et al.  A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification , 2016, IEEE Transactions on Cybernetics.

[42]  Xin Yao,et al.  A New Dominance Relation-Based Evolutionary Algorithm for Many-Objective Optimization , 2016, IEEE Transactions on Evolutionary Computation.

[43]  Yang Zhang,et al.  Key quality characteristics selection for imbalanced production data using a two-phase bi-objective feature selection method , 2019, Eur. J. Oper. Res..

[44]  Kemal Kilic,et al.  A Novel Hybrid Genetic Local Search Algorithm for Feature Selection and Weighting with an Application in Strategic Decision Making in Innovation Management , 2017, Inf. Sci..

[45]  Chee Peng Lim,et al.  A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models , 2014, Neurocomputing.

[46]  Enrique Alba,et al.  Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis , 2009, Inf. Process. Lett..

[47]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.