Unsupervised feature selection based on bio-inspired approaches

Abstract In recent years, the scientific community has witnessed an explosion in the use of pattern recognition algorithms. However, little attention has been paid to the tasks preceding the execution of these algorithms, the preprocessing activities. One of these tasks is dimensionality reduction, in which a subset of features that improves the performance of the mining algorithm is located and algorithm's runtime is reduced. Although there are many methods that address the problems in pattern recognition algorithms, effective solutions still need to be researched and explored. Hence, this paper aims to address three of the issues surrounding these algorithms. First, we propose adapting a promising meta-heuristic called biased random-key genetic algorithm, which considers a random initial population construction. We call this algorithm as unsupervised feature selection by biased random-key genetic algorithm I. Next, we propose an approach for building the initial population partly in a deterministic way. Thus, we applied this idea in two algorithms, named unsupervised feature selection by particle swarm optimization and unsupervised feature selection by biased random-key genetic algorithm II. Finally, we simulated different datasets to study the effects of relevant and irrelevant attributes, and of noisy and missing data on the performance of the algorithms. After the Wilcoxon rank-sum test, we can state that the proposed algorithms outperform all other methods in different datasets. It was also observed that the construction of the initial population in a partially deterministic way contributed to the better performance. It should be noted that some methods are more sensitive to noisy and missing data than others, as well as to relevant and irrelevant attributes.

[1]  Luiz Antonio Nogueira Lorena,et al.  Constructive Genetic Algorithm for Clustering Problems , 2001, Evolutionary Computation.

[2]  Mauricio G. C. Resende,et al.  An extended Akers graphical method with a biased random-key genetic algorithm for job-shop scheduling , 2014, Int. Trans. Oper. Res..

[3]  Flávio Keidi Miyazawa,et al.  Evolutionary algorithm for the k-interconnected multi-depot multi-traveling salesmen problem , 2013, GECCO '13.

[4]  Qing Chang,et al.  Feature selection methods for big data bioinformatics: A survey from the search perspective. , 2016, Methods.

[5]  P Festa,et al.  A biased random-key genetic algorithm for data clustering. , 2013, Mathematical biosciences.

[6]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[7]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[8]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[9]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[10]  Rajalaxmi Ramasamy,et al.  Modified binary bat algorithm for feature selection in unsupervised learning , 2018, Int. Arab J. Inf. Technol..

[11]  Amrita Priyam,et al.  Feature Selection using Genetic Algorithm for Clustering high Dimensional Data , 2018 .

[12]  Jianzhong Wang,et al.  Unsupervised feature selection by regularized matrix factorization , 2018, Neurocomputing.

[13]  Chao Bi,et al.  Inner Product Regularized Nonnegative Self Representation for Image Classification and Clustering , 2017, IEEE Access.

[14]  R. R. Rajalaxmi,et al.  Unsupervised feature selection using binary bat algorithm , 2015, 2015 2nd International Conference on Electronics and Communication Systems (ICECS).

[15]  A. Pan A Constructive Genetic Algorithm for the P- Median Location Problem of Typhoon Emergency Shelter in China Coastal Rural Areas , 2011 .

[16]  Chorng-Shyong Ong,et al.  Variable selection in clustering for marketing segmentation using genetic algorithms , 2008, Expert Syst. Appl..

[17]  Amparo Alonso-Betanzos,et al.  Filter Methods for Feature Selection - A Comparative Study , 2007, IDEAL.

[18]  José Fernando Gonçalves,et al.  A hybrid genetic algorithm-heuristic for a two-dimensional orthogonal packing problem , 2007, Eur. J. Oper. Res..

[19]  S. Kim Variable Selection and Outlier Detection for Automated K-means Clustering , 2015 .

[20]  James C. Bean,et al.  Genetic Algorithms and Random Keys for Sequencing and Optimization , 1994, INFORMS J. Comput..

[21]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[23]  Mauricio G. C. Resende,et al.  A biased random key genetic algorithm for 2D and 3D bin packing problems , 2013 .

[24]  Jianzhong Wang,et al.  Adaptive multiple graph regularized semi-supervised extreme learning machine , 2018, Soft Comput..

[25]  Mark S. Nixon,et al.  Low-level feature extraction (including edge detection) , 2020, Feature Extraction and Image Processing for Computer Vision.

[26]  Parham Moradi,et al.  An unsupervised feature selection algorithm based on ant colony optimization , 2014, Eng. Appl. Artif. Intell..

[27]  Simone Melzi,et al.  Ranking to Learn: - Feature Ranking and Selection via Eigenvector Centrality , 2016, NFMCP@PKDD/ECML.

[28]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[29]  Chin-Teng Lin,et al.  A review of clustering techniques and developments , 2017, Neurocomputing.

[30]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[31]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[32]  Marcelo Seido Nagano,et al.  A constructive evolutionary approach for feature selection in unsupervised learning , 2018, Swarm Evol. Comput..

[33]  Bing Xue,et al.  PSO with surrogate models for feature selection: static and dynamic clustering-based methods , 2018, Memetic Comput..

[34]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Umberto Castellani,et al.  Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Glenn Platt,et al.  Unsupervised feature selection using swarm intelligence and consensus clustering for automatic fault detection and diagnosis in Heating Ventilation and Air Conditioning systems , 2015, Appl. Soft Comput..

[37]  Mohammad Saraee,et al.  A new unsupervised feature selection method for text clustering based on genetic algorithms , 2012, Journal of Intelligent Information Systems.

[38]  Mauricio G. C. Resende,et al.  A parallel multi-population genetic algorithm for a constrained two-dimensional orthogonal packing problem , 2011, J. Comb. Optim..

[39]  Tzung-Pei Hong,et al.  Using group genetic algorithm to improve performance of attribute clustering , 2015, Appl. Soft Comput..

[40]  Bin Ran,et al.  Feature selection with redundancy-complementariness dispersion , 2015, Knowl. Based Syst..

[41]  José Fco. Martínez-Trinidad,et al.  A review of unsupervised feature selection methods , 2019, Artificial Intelligence Review.

[42]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[43]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[44]  Jay Prakash,et al.  Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach , 2019, Soft Comput..

[45]  M. Brusco,et al.  A variable-selection heuristic for K-means clustering , 2001 .

[46]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[47]  Flávio Keidi Miyazawa,et al.  Evolutionary algorithms for overlapping correlation clustering , 2014, GECCO.

[48]  Sabu M. Thampi,et al.  Unsupervised gene selection using particle swarm optimization and k-means , 2015, CODS.

[49]  Urszula Stanczyk,et al.  Feature Evaluation by Filter, Wrapper, and Embedded Approaches , 2015, Feature Selection for Data and Pattern Recognition.

[50]  Mauricio G. C. Resende,et al.  Biased random-key genetic algorithms for combinatorial optimization , 2011, J. Heuristics.

[51]  M. C. Ortiz,et al.  Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes , 2004 .

[52]  Glaydston Mattos Ribeiro,et al.  A Constructive Genetic Algorithm for Discrete Dispersion on Point Feature Cartographic Label Placement Problems , 2016 .

[53]  Saba Jameel,et al.  An optimal feature selection method using a modified wrapper-based ant colony optimisation , 2018, Journal of the National Science Foundation of Sri Lanka.

[54]  Millie Pant,et al.  Link based BPSO for feature selection in big data text clustering , 2017, Future Gener. Comput. Syst..

[55]  Mohammed Azmi Al-Betar,et al.  Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering , 2016, 2016 7th International Conference on Computer Science and Information Technology (CSIT).

[56]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[57]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[58]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[59]  Alexandre César Muniz de Oliveira,et al.  -Opt Population Training for Minimization of Open Stack Problem , 2002, SBIA.

[60]  J.G.R. Sathiaseelan,et al.  Feature Selection Using K-Means Genetic Algorithm for Multi-objective Optimization , 2015 .

[61]  Marcelo Seido Nagano,et al.  A Constructive Genetic Algorithm for Permutation Flowshop Scheduling Version 2 , 2007 .

[62]  José Fernando Gonçalves,et al.  A Hybrid Genetic Algorithm for Assembly Line Balancing , 2002, J. Heuristics.

[63]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[64]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[65]  Sunanda Das,et al.  Ensemble feature selection using bi-objective genetic algorithm , 2017, Knowl. Based Syst..

[66]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[67]  Julio López,et al.  Synchronized feature selection for Support Vector Machines with twin hyperplanes , 2017, Knowl. Based Syst..

[68]  Shaojie Qiao,et al.  Non-Negative Matrix Factorization With Locality Constrained Adaptive Graph , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[69]  Sanghamitra Bandyopadhyay,et al.  Simultaneous feature selection and symmetry based clustering using multiobjective framework , 2015, Appl. Soft Comput..

[70]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[71]  Ahmad Taher Azar,et al.  Feature selection using swarm-based relative reduct technique for fetal heart rate , 2014, Neural Computing and Applications.

[72]  Christian Osendorfer,et al.  Sequential Feature Selection for Classification , 2011, Australasian Conference on Artificial Intelligence.

[73]  Laith Mohammad Abualigah,et al.  A new feature selection method to improve the document clustering using particle swarm optimization algorithm , 2017, J. Comput. Sci..

[74]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[75]  Mauricio G. C. Resende,et al.  An evolutionary algorithm for manufacturing cell formation , 2004, Comput. Ind. Eng..

[76]  Mauricio G. C. Resende,et al.  Discrete Optimization A hybrid genetic algorithm for the job shop scheduling problem , 2005 .

[77]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[78]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[79]  V. S. Shankar Sriram,et al.  An efficient intrusion detection system based on hypergraph - Genetic algorithm for parameter optimization and feature selection in support vector machine , 2017, Knowl. Based Syst..

[80]  Yueh-Min Huang,et al.  Proportionate flexible flow shop scheduling via a hybrid constructive genetic algorithm , 2008, Expert Syst. Appl..

[81]  Han Wang,et al.  Unsupervised feature selection via low-rank approximation and structure learning , 2017, Knowl. Based Syst..

[82]  Mahmoud Owais,et al.  Multi-Objective Transit Route Network Design as Set Covering Problem , 2016, IEEE Transactions on Intelligent Transportation Systems.

[83]  I. Jolliffe Principal Component Analysis , 2002 .

[84]  Yang Wang,et al.  Locality constrained Graph Optimization for Dimensionality Reduction , 2017, Neurocomputing.

[85]  Jitender Kumar Chhabra,et al.  Automatic Unsupervised Feature Selection using Gravitational Search Algorithm , 2015 .

[86]  Mauricio G. C. Resende,et al.  A biased random-key genetic algorithm with forward-backward improvement for the resource constrained project scheduling problem , 2011, J. Heuristics.

[87]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[88]  Huan Liu,et al.  Embedded Unsupervised Feature Selection , 2015, AAAI.

[89]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[90]  Jianzhong Wang,et al.  Ordinal preserving matrix factorization for unsupervised feature selection , 2018, Signal Process. Image Commun..

[91]  Alexandre César Muniz de Oliveira,et al.  A constructive genetic algorithm for gate matrix layout problems , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[92]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..