A multi-objective evolutionary algorithm based on length reduction for large-scale instance selection

Abstract Instance selection, as an important data pre-processing task, is widely used in supervised classification . Recently, a series of instance selection algorithms with different techniques have been suggested. Among them, evolutionary algorithms (EAs) have shown competitive performance. However, when the size of instance set is large, these EA-based algorithms may face great challenges on search efficiency and computational cost. To this end, in this paper, a multi-objective evolutionary algorithm based on length reduction, termed as LRIS, is proposed for large-scale instance selection, where a length reduction strategy is suggested to recursively shorten the length of each individual in the population, and improve the computational efficiency of LRIS greatly. Specifically, in the proposed length reduction strategy of LRIS, each gene in the individuals has a probability of being deleted, whose probability is obtained according to the importance of the corresponding instance in the instance set and the importance of the corresponding gene in the population. Then, two evolutionary operators (e.g. crossover and mutation) based on the length reduction strategy are developed to generate offspring population from the reduced population. In addition, an individual repairing operator is also designed to repair the length of over-reduced individuals. Experimental results on 12 large-scale data sets have demonstrated the efficiency and the effectiveness of the proposed LRIS in comparison with the state-of-the-art EA-based instance selection algorithms.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Clara Pizzuti,et al.  Multiobjective Optimization and Local Merge for Clustering Attributed Graphs , 2020, IEEE Transactions on Cybernetics.

[3]  Jiye Liang,et al.  An efficient instance selection algorithm for k nearest neighbor regression , 2017, Neurocomputing.

[4]  Francisco Herrera,et al.  A multi-objective evolutionary approach to training set selection for support vector machine , 2018, Knowl. Based Syst..

[5]  Javier Pérez-Rodríguez,et al.  Combining three strategies for evolutionary instance selection for instance-based learning , 2018, Swarm Evol. Comput..

[6]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[7]  Mohsen Rahmani,et al.  An Instance Selection Algorithm Based on ReliefF , 2019, Int. J. Artif. Intell. Tools.

[8]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[9]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[10]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[11]  Saroj Ratnoo,et al.  Instance Selection Using Multi-objective CHC Evolutionary Algorithm , 2019 .

[12]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[13]  Francisco Herrera,et al.  Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis , 2016, Appl. Soft Comput..

[14]  Carla E. Brodley,et al.  Recursive automatic bias selection for classifier construction , 1995, Machine Learning.

[15]  Mohammad Aslani,et al.  A fast instance selection method for support vector machines in building extraction , 2020, Appl. Soft Comput..

[16]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[17]  Alok Kumar Shukla,et al.  Feature selection inspired by human intelligence for improving classification accuracy of cancer types , 2020, Comput. Intell..

[18]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[19]  Lei Zhang,et al.  A subregion division based multi-objective evolutionary algorithm for SVM training set selection , 2020, Neurocomputing.

[20]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[21]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[22]  Francisco Herrera,et al.  MC2ESVM: Multiclass Classification Based on Cooperative Evolution of Support Vector Machines , 2018, IEEE Computational Intelligence Magazine.

[23]  Chih-Fong Tsai,et al.  Towards high dimensional instance selection: An evolutionary approach , 2014, Decis. Support Syst..

[24]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[25]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[26]  Nicolás García-Pedrajas,et al.  Improving the combination of results in the ensembles of prototype selectors , 2019, Neural Networks.

[27]  M. Narasimha Murty,et al.  An incremental prototype set building technique , 2002, Pattern Recognit..

[28]  Nicolás García-Pedrajas,et al.  A cooperative coevolutionary algorithm for instance selection for instance-based learning , 2010, Machine Learning.

[29]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[30]  Wenyong Wang,et al.  An efficient instance selection algorithm to reconstruct training set for support vector machine , 2017, Knowl. Based Syst..

[31]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[32]  Sanjeev Kumar Pippal,et al.  Knowledge discovery in medical and biological datasets by integration of Relief-F and correlation feature selection techniques , 2020, J. Intell. Fuzzy Syst..

[33]  Diwakar Tripathi,et al.  A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges , 2019, Evolutionary Intelligence.

[34]  Francisco Herrera,et al.  An Evolutionary Multiobjective Model and Instance Selection for Support Vector Machines With Pareto-Based Ensembles , 2017, IEEE Transactions on Evolutionary Computation.

[35]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[36]  Alok Kumar Shukla,et al.  Detecting biomarkers from microarray data using distributed correlation based gene selection , 2020, Genes & Genomics.

[37]  Manu Vardhan,et al.  A new hybrid wrapper TLBO and SA with SVM approach for gene expression data , 2019, Inf. Sci..

[38]  George D. C. Cavalcanti,et al.  ATISA: Adaptive Threshold-based Instance Selection Algorithm , 2013, Expert Syst. Appl..

[39]  Elena Marchiori,et al.  Hit Miss Networks with Applications to Instance Selection , 2008, J. Mach. Learn. Res..

[40]  Hadi Sadoghi Yazdi,et al.  LMIRA: Large Margin Instance Reduction Algorithm , 2014, Neurocomputing.

[41]  Álvar Arnaiz-González,et al.  Evolutionary prototype selection for multi-output regression , 2019, Neurocomputing.