Cooperative co-evolution for feature selection in Big Data with random feature grouping

A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG) . Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k -Nearest Neighbor ( k -NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity.

[1]  Xiaodong Li,et al.  Effective decomposition of large-scale separable continuous functions for cooperative co-evolutionary algorithms , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[2]  Anthony Pinto,et al.  Using machine learning techniques to identify rare cyber‐attacks on the UNSW‐NB15 dataset , 2019, Secur. Priv..

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Yang Yang,et al.  A historical interdependency based differential grouping algorithm for large scale global optimization , 2018, GECCO.

[5]  Wei-neng Chen,et al.  Cooperation coevolution with fast interdependency identification for large scale optimization , 2017, Inf. Sci..

[6]  Weiping Ding,et al.  A novel approach to minimum attribute reduction based on quantum-inspired self-adaptive cooperative co-evolution , 2013, Knowl. Based Syst..

[7]  Chee Peng Lim,et al.  A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models , 2014, Neurocomputing.

[8]  Min Han,et al.  Global mutual information-based feature selection approach using single-objective and multi-objective optimization , 2015, Neurocomputing.

[9]  Zhenyu Yang,et al.  Large-Scale Global Optimization Using Cooperative Coevolution with Variable Interaction Learning , 2010, PPSN.

[10]  Basabi Chakraborty,et al.  A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms , 2018, J. Inf. Telecommun..

[11]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[12]  Gil Alterovitz,et al.  Accelerating wrapper-based feature selection with K-nearest-neighbor , 2015, Knowl. Based Syst..

[13]  Jordan B. Pollack,et al.  On identifying global optima in cooperative coevolution , 2005, GECCO '05.

[14]  Tao Zhang,et al.  Cooperative co-evolution with improved differential grouping method for large-scale global optimisation , 2018, Int. J. Bio Inspired Comput..

[15]  Peter Filzmoser,et al.  Robust and sparse estimation methods for high-dimensional linear and logistic regression , 2017, 1703.04951.

[16]  Adel Binbusayyis,et al.  Identifying and Benchmarking Key Features for Cyber Intrusion Detection: An Ensemble Approach , 2019, IEEE Access.

[17]  Ponnuthurai Nagaratnam Suganthan,et al.  Benchmark Functions for the CEC'2013 Special Session and Competition on Large-Scale Global Optimization , 2008 .

[18]  Minqiang Li,et al.  Dual-population based coevolutionary algorithm for designing RBFNN with feature selection , 2010, Expert Syst. Appl..

[19]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[20]  Alok Kumar Shukla,et al.  Feature selection inspired by human intelligence for improving classification accuracy of cancer types , 2020, Comput. Intell..

[21]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[22]  Saman K. Halgamuge,et al.  On the Selection of Decomposition Methods for Large Scale Fully Non-separable Problems , 2015, GECCO.

[23]  Liang Hu,et al.  Feature redundancy term variation for mutual information-based feature selection , 2020, Applied Intelligence.

[24]  Yuan Sun,et al.  Extended Differential Grouping for Large Scale Global Optimization with Direct and Indirect Variable Interactions , 2015, GECCO.

[25]  Xin Yao,et al.  Multilevel cooperative coevolution for large scale optimization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Mitsuo Gen,et al.  Cooperative Co-Evolution Algorithm with an MRF-Based Decomposition Strategy for Stochastic Flexible Job Shop Scheduling , 2019, Mathematics.

[28]  Thar Baker,et al.  Analysis of Dimensionality Reduction Techniques on Big Data , 2020, IEEE Access.

[29]  Hua Xu,et al.  A cooperative coevolution-based pittsburgh learning classifier system embedded with memetic feature selection , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[30]  Maoguo Gong,et al.  Memetic algorithm based feature selection for hyperspectral images classification , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[31]  V. Bajic,et al.  DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm , 2015, PloS one.

[32]  Francisco Herrera,et al.  A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection , 2009, HAIS.

[33]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[34]  Pradeep Singh,et al.  Gene selection for cancer types classification using novel hybrid metaheuristics approach , 2020, Swarm Evol. Comput..

[35]  Nicolas Durand,et al.  Genetic crossover operator for partially separable functions , 1998 .

[36]  Enda Barrett,et al.  Unsupervised learning with hierarchical feature selection for DDoS mitigation within the ISP domain , 2019 .

[37]  Muhammet Üsame Öziç,et al.  T-test feature ranking based 3D MR classification with VBM mask , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[38]  H. Khanna Nehemiah,et al.  Feature Selection and Instance Selection from Clinical Datasets Using Co-operative Co-evolution and Classification Using Random Forest , 2020, IETE Journal of Research.

[39]  Stefan Klein,et al.  Feature Selection Based on the SVM Weight Vector for Classification of Dementia , 2015, IEEE Journal of Biomedical and Health Informatics.

[40]  Bin Cao,et al.  Cooperative co-evolution with graph-based differential grouping for large scale global optimization , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[41]  Wang Jian,et al.  Research and application of the improved algorithm C4.5 on Decision tree , 2009, 2009 International Conference on Test and Measurement.

[42]  Min Shi,et al.  Reference sharing: a new collaboration model for cooperative coevolution , 2017, Journal of Heuristics.

[43]  Bernd Bischl,et al.  Benchmark for filter methods for feature selection in high-dimensional classification data , 2020, Comput. Stat. Data Anal..

[44]  Saman K. Halgamuge,et al.  A Recursive Decomposition Method for Large Scale Continuous Optimization , 2018, IEEE Transactions on Evolutionary Computation.

[45]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[46]  Xianping Wu,et al.  A mean-field formulation for optimal multi-period mean-variance portfolio selection with an uncertain exit time , 2014, Oper. Res. Lett..

[47]  Selma Ayse Özel,et al.  A hybrid approach of differential evolution and artificial bee colony for feature selection , 2016, Expert Syst. Appl..

[48]  Andreas T. Ernst,et al.  Decomposition for Large-scale Optimization Problems with Overlapping Components , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[49]  Y. W. Xu,et al.  Feature subset selection based on co-evolution for pedestrian detection , 2011 .

[50]  Jun Zhang,et al.  A random-based dynamic grouping strategy for large scale multi-objective optimization , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[51]  Alok Kumar Shukla,et al.  Detecting biomarkers from microarray data using distributed correlation based gene selection , 2020, Genes & Genomics.

[52]  Abdorrahman Haeri,et al.  A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection , 2019, Journal of Ambient Intelligence and Humanized Computing.

[53]  Panos M. Pardalos,et al.  k-Nearest Neighbor Classification , 2009 .

[54]  Hongfei Teng,et al.  Cooperative Co-evolutionary Differential Evolution for Function Optimization , 2005, ICNC.

[55]  Shuai Wu,et al.  A Dynamic Global Differential Grouping for Large-Scale Black-Box Optimization , 2018, ICSI.

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  Michael K. Ng,et al.  Subspace clustering with automatic feature grouping , 2015, Pattern Recognit..

[58]  A N M Bazlur Rashid,et al.  Knowledge management overview of feature selection problem in high-dimensional financial data: cooperative co-evolution and MapReduce perspectives , 2019, Problems and Perspectives in Management.

[59]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[60]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[61]  Hossein Nezamabadi-pour,et al.  An advanced ACO algorithm for feature subset selection , 2015, Neurocomputing.

[62]  Xiaodong Li,et al.  Cooperative Co-evolution with delta grouping for large scale non-separable function optimization , 2010, IEEE Congress on Evolutionary Computation.

[63]  Jing J. Liang,et al.  Two-Stage Decomposition Method Based on Cooperation Coevolution for Feature Selection on High-Dimensional Classification , 2019, IEEE Access.

[64]  Xiaoyan Sun,et al.  Variable-Size Cooperative Coevolutionary Particle Swarm Optimization for Feature Selection on High-Dimensional Data , 2020, IEEE Transactions on Evolutionary Computation.

[65]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[66]  Xiaodong Li,et al.  A Competitive Divide-and-Conquer Algorithm for Unconstrained Large-Scale Black-Box Optimization , 2016, ACM Trans. Math. Softw..

[67]  Xiaodong Li,et al.  Evolutionary large-scale global optimization: an introduction , 2017, GECCO.

[68]  Beatriz de la Iglesia,et al.  Survey on Feature Selection , 2015, ArXiv.

[69]  Shahryar Rahnamayan,et al.  Cooperative Co-evolution with a new decomposition method for large-scale optimization , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[70]  Xiaodong Li,et al.  Cooperative Co-evolution for large scale optimization through more frequent random grouping , 2010, IEEE Congress on Evolutionary Computation.

[71]  Gérard Biau,et al.  Accelerated gradient boosting , 2018, Machine Learning.

[72]  Xin Yao,et al.  Large scale evolutionary optimization using cooperative coevolution , 2008, Inf. Sci..

[73]  Zhiyong Zeng,et al.  Feature Selection Based on Dependency Margin , 2015, IEEE Transactions on Cybernetics.

[74]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[75]  Giuseppe A. Trunfio,et al.  A new algorithm for adapting the configuration of subcomponents in large-scale optimization with cooperative coevolution , 2016, Inf. Sci..

[76]  Shahryar Rahnamayan,et al.  Metaheuristics in large-scale global continues optimization: A survey , 2015, Inf. Sci..

[77]  Yao Zhao,et al.  Sparsity Learning Formulations for Mining Time-Varying Data , 2015, IEEE Transactions on Knowledge and Data Engineering.

[78]  Xiaodong Li,et al.  DG2: A Faster and More Accurate Differential Grouping for Large-Scale Black-Box Optimization , 2017, IEEE Transactions on Evolutionary Computation.

[79]  Hossein Nezamabadi-pour,et al.  CCFS: A cooperating coevolution technique for large scale feature selection on microarray datasets , 2018, Comput. Biol. Chem..

[80]  A. N. M. Bazlur Rashid Access methods for Big Data: current status and future directions , 2017, EAI Endorsed Trans. Scalable Inf. Syst..

[81]  Manu Vardhan,et al.  A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory , 2019, Int. J. Comput. Intell. Appl..

[82]  Mikko Kolehmainen,et al.  Why don’t you use Evolutionary Algorithms in Big Data? , 2017 .

[83]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[84]  Rohitash Chandra,et al.  On the relationship of degree of separability with depth of evolution in decomposition for cooperative coevolution , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[85]  Nikhil R. Pal,et al.  A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification , 2016, IEEE Transactions on Cybernetics.

[86]  Julio López,et al.  Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification , 2018, Appl. Soft Comput..

[87]  Rudolf Paul Wiegand,et al.  An analysis of cooperative coevolutionary algorithms , 2004 .

[88]  Andries Petrus Engelbrecht,et al.  A Cooperative approach to particle swarm optimization , 2004, IEEE Transactions on Evolutionary Computation.

[89]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[90]  Xiaodong Li,et al.  Adaptive threshold parameter estimation with recursive differential grouping for problem decomposition , 2018, GECCO.

[91]  Svetha Venkatesh,et al.  Stabilizing l1-norm prediction models by supervised feature grouping , 2016, J. Biomed. Informatics.

[92]  Mitchell A. Potter,et al.  The design and analysis of a computational model of cooperative coevolution , 1997 .

[93]  Francisco Herrera,et al.  IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..

[94]  Y. W. Xu,et al.  Co-Evolution based Feature Selection for Pedestrian Detection , 2007, 2007 IEEE International Conference on Control and Automation.

[95]  Hua Xu,et al.  An improved NSGA-III procedure for evolutionary many-objective optimization , 2014, GECCO.

[96]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.

[97]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[98]  Xiaodong Li,et al.  Cooperative Co-Evolution With Differential Grouping for Large Scale Optimization , 2014, IEEE Transactions on Evolutionary Computation.

[99]  Leslie F. Sikos,et al.  A Novel Penalty-Based Wrapper Objective Function for Feature Selection in Big Data Using Cooperative Co-Evolution , 2020, IEEE Access.