bSSA: Binary Salp Swarm Algorithm With Hybrid Data Transformation for Feature Selection

Feature selection is a technique commonly used in Data Mining and Machine Learning. Traditional feature selection methods, when applied to large datasets, generate a large number of feature subsets. Selecting optimal features within this high dimensional data space is time-consuming and negatively affects the system’s performance. This paper proposes a new binary Salp Swarm Algorithm (bSSA) for selecting the best feature set from transformed datasets. The proposed feature selection method first transforms the original data-set using Principal Component Analysis (PCA) and fast Independent Component Analysis (fastICA) based hybrid data transformation methods; next, a binary Salp Swarm optimizer is used for finding the best features. The proposed feature selection approach improves accuracy and eliminates the selection of irrelevant features. We validate our technique on fifteen different benchmark data sets. We conduct an extensive study to measure the performance and feature selection accuracy of the proposed technique. The proposed bSSA is compared to Binary Genetic Algorithm (bGA), Binary Binomial Cuckoo Search (bBCS), Binary Grey Wolf Optimizer (bGWO), Binary Competitive Swarm Optimizer (bCSO), and Binary Crow Search Algorithm (bCSA). The proposed method attains a mean accuracy of 95.26% with 7.78% features on PCA-fastICA transformed datasets. The results show that bSSA outperforms the existing methods for the majority of the performance measures.

[1]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[2]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Aboul Ella Hassanien,et al.  Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection , 2018, Applied Intelligence.

[4]  Mengjie Zhang,et al.  Particle Swarm Optimisation and Statistical Clustering for Feature Selection , 2013, Australasian Conference on Artificial Intelligence.

[5]  Avinash Chandra Pandey,et al.  Hybrid step size based cuckoo search , 2017, 2017 Tenth International Conference on Contemporary Computing (IC3).

[6]  Liang Du,et al.  Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[7]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Avinash Chandra Pandey,et al.  Feature Selection Method Based on Grey Wolf Optimization and Simulated Annealing , 2019 .

[9]  Atanu Biswas,et al.  Independent component analysis and clustering for pollution data , 2014, Environmental and Ecological Statistics.

[10]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[11]  Yaochu Jin,et al.  Feature selection for high-dimensional classification using a competitive swarm optimizer , 2016, Soft Computing.

[12]  José M. Peña,et al.  On the Complexity of Discrete Feature Selection for Optimal Classification , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Avinash Chandra Pandey,et al.  Levy inspired Enhanced Grey Wolf Optimizer , 2019, 2019 Fifth International Conference on Image Information Processing (ICIIP).

[14]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[15]  K. V. Arya,et al.  Feature selection and classification of leukocytes using random forest , 2014, Medical & Biological Engineering & Computing.

[16]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[17]  Aboul Ella Hassanien,et al.  Feature selection via a novel chaotic crow search algorithm , 2017, Neural Computing and Applications.

[18]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  C Bugli,et al.  Comparison between Principal Component Analysis and Independent Component Analysis in Electroencephalograms Modelling , 2007, Biometrical journal. Biometrische Zeitschrift.

[20]  John B. O. Mitchell,et al.  Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction , 2008, Chemistry Central journal.

[21]  Nizamettin Aydin,et al.  Binary black hole algorithm for feature selection and classification on biological data , 2017, Appl. Soft Comput..

[22]  Avinash Chandra Pandey,et al.  Semi-supervised Spatiotemporal Classification and Trend Analysis of Satellite Images , 2018 .

[23]  Mehmet Emin Tenekeci,et al.  Effective ECG beat classification using higher order statistic features and genetic feature selection , 2017 .

[24]  Abdul Aziz Jemain,et al.  Comparison of several variants of principal component analysis (PCA) on forensic analysis of paper based on IR spectrum , 2016 .

[25]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[26]  Avinash Chandra Pandey,et al.  Feature selection method based on hybrid data transformation and binary binomial cuckoo search , 2019, Journal of Ambient Intelligence and Humanized Computing.

[27]  Aboul Ella Hassanien,et al.  Binary grey wolf optimization approaches for feature selection , 2016, Neurocomputing.

[28]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[29]  Gehad Ismael,et al.  Feature selection via a novel chaotic crow search algorithm , 2017 .

[30]  Mengjie Zhang,et al.  Dimensionality reduction in face detection: A genetic programming approach , 2009, 2009 24th International Conference Image and Vision Computing New Zealand.

[31]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[32]  Li-Yeh Chuang,et al.  Feature Selection Using Memetic Algorithms , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[33]  Salima Ouadfel,et al.  Enhanced Crow Search Algorithm for Feature Selection , 2020, Expert Syst. Appl..

[34]  Raju Pal,et al.  Spiral Salp Swarm Optimization Algorithm , 2019, 2019 4th International Conference on Information Systems and Computer Networks (ISCON).

[35]  Nashwa El-Bendary,et al.  Using Hybrid Filter-Wrapper Feature Selection With Multi-Objective Improved-Salp Optimization for Crack Severity Recognition , 2020, IEEE Access.

[36]  Hossam Faris,et al.  An evolutionary gravitational search-based feature selection , 2019, Inf. Sci..

[37]  Zong Woo Geem,et al.  Improved Binary Sailfish Optimizer Based on Adaptive β-Hill Climbing for Feature Selection , 2020, IEEE Access.

[38]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[39]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[40]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[41]  Adel Al-Jumaily,et al.  A Combined Ant Colony and Differential Evolution Feature Selection Algorithm , 2008, ANTS Conference.

[42]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Yi Yang,et al.  Co-Regularized Ensemble for Feature Selection , 2013, IJCAI.

[45]  Myeongsu Kang,et al.  A Hybrid Feature Selection Scheme for Reducing Diagnostic Performance Deterioration Caused by Outliers in Data-Driven Diagnostics , 2016, IEEE Transactions on Industrial Electronics.

[46]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[47]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[48]  Yao Li,et al.  Spotted Hyena Optimization Algorithm With Simulated Annealing for Feature Selection , 2019, IEEE Access.

[49]  Francisco Herrera,et al.  A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection , 2009, HAIS.

[50]  Majdi M. Mafarja,et al.  A hybrid mine blast algorithm for feature selection problems , 2020, Soft Comput..

[51]  Agma J. M. Traina,et al.  Improving the ranking quality of medical image retrieval using a genetic feature selection method , 2011, Decis. Support Syst..

[52]  Majdi M. Mafarja,et al.  Hybrid Whale Optimization Algorithm with simulated annealing for feature selection , 2017, Neurocomputing.

[53]  Raju Pal,et al.  Unsupervised data classification using modified cuckoo search method , 2016, 2016 Ninth International Conference on Contemporary Computing (IC3).

[54]  ShashuaAmnon,et al.  Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach , 2005, J. Mach. Learn. Res..

[55]  Ruisheng Zhang,et al.  A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection , 2017, Appl. Soft Comput..

[56]  Dana Kulic,et al.  An evaluation of classifier-specific filter measure performance for feature selection , 2015, Pattern Recognit..

[57]  Zongben Xu,et al.  A multiobjective ACO algorithm for rough feature selection , 2010, 2010 Second Pacific-Asia Conference on Circuits, Communications and System.

[58]  Ashish Ghosh,et al.  Self-adaptive differential evolution for feature selection in hyperspectral image data , 2013, Appl. Soft Comput..

[59]  Hui Li,et al.  Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine , 2014, Appl. Soft Comput..

[60]  Fangrui Liu,et al.  An Improved Salp Swarm Algorithm Based on Spark for Feature Selection , 2020, 2020 15th International Conference on Computer Science & Education (ICCSE).

[61]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[62]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[63]  Vikrant Bhateja,et al.  Deluge based Genetic Algorithm for feature selection , 2019, Evolutionary Intelligence.

[64]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[65]  Kim-Anh Lê Cao,et al.  Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets , 2012, BMC Bioinformatics.

[66]  Avinash Chandra Pandey,et al.  Data Clustering Based on Data Transformation and Hybrid Step Size-Based Cuckoo Search , 2018, 2018 Eleventh International Conference on Contemporary Computing (IC3).

[67]  Liang-Hsuan Chen,et al.  Feature selection to diagnose a business crisis by using a real GA-based support vector machine: An empirical study , 2008, Expert Syst. Appl..

[68]  Zhiyong Zeng,et al.  Feature Selection Based on Dependency Margin , 2015, IEEE Transactions on Cybernetics.

[69]  Jian Cheng,et al.  Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[70]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[71]  Simon Fong,et al.  Dual feature selection and rebalancing strategy using metaheuristic optimization algorithms in X-ray image datasets , 2019, Multimedia Tools and Applications.

[72]  Erkki Oja,et al.  The FastICA Algorithm Revisited: Convergence Analysis , 2006, IEEE Transactions on Neural Networks.

[73]  Yasin Kaya,et al.  Feature selection using binary cuckoo search algorithm , 2018, 2018 26th Signal Processing and Communications Applications Conference (SIU).

[74]  Hossam Faris,et al.  Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems , 2017, Adv. Eng. Softw..

[75]  Mengjie Zhang,et al.  Improved PSO for Feature Selection on High-Dimensional Datasets , 2014, SEAL.

[76]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[77]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[78]  M. Esmel ElAlami A filter model for feature subset selection based on genetic algorithm , 2009, Knowl. Based Syst..

[79]  S. Matarrese,et al.  Non-Gaussianity from inflation: theory and observations , 2004 .

[80]  Avinash Chandra Pandey,et al.  Data clustering using hybrid improved cuckoo search method , 2016, 2016 Ninth International Conference on Contemporary Computing (IC3).

[81]  Yong Zhang,et al.  Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm , 2019, Expert Syst. Appl..

[82]  Avinash Chandra Pandey,et al.  Twitter sentiment analysis using hybrid cuckoo search method , 2017, Inf. Process. Manag..

[83]  Mohammed Azmi Al-Betar,et al.  The monarch butterfly optimization algorithm for solving feature selection problems , 2020, Neural Computing and Applications.

[84]  Surbhi Sharma,et al.  Quantum based Whale Optimization Algorithm for wrapper feature selection , 2020, Appl. Soft Comput..

[85]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[86]  Yasin Kaya,et al.  Comparison of classification algorithms in classification of ECG beats by time series , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[87]  Yong Zhang,et al.  Multiobjective Particle Swarm Optimization for Feature Selection With Fuzzy Cost , 2020, IEEE Transactions on Cybernetics.