Evolutionary prototype selection for multi-output regression

Abstract A novel approach to prototype selection for multi-output regression data sets is presented. A multi-objective evolutionary algorithm is used to evaluate the selections using two criteria: training data set compression and prediction quality expressed in terms of root mean squared error. A multi-target regressor based on k-NN was used for that purpose during the training to evaluate the error, while the tests were performed using four different multi-target predictive models. The distance matrices used by the multi-target regressor were cached to accelerate operational performance. Multiple Pareto fronts were also used to prevent overfitting and to obtain a broader range of solutions, by using different probabilities in the initialization of populations and different evolutionary parameters in each one. The results obtained with the benchmark data sets showed that the proposed method greatly reduced data set size and, at the same time, improved the predictive capabilities of the multi-output regressors trained on the reduced data set.

[1]  Sergio Ramírez-Gallego,et al.  Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Chris Mellish,et al.  On the Consistency of Information Filters for Lazy Learning Algorithms , 1999, PKDD.

[3]  Ying Liu,et al.  Real time prediction for converter gas tank levels based on multi-output least square support vector regressor , 2012 .

[4]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[5]  Thierry Denoeux,et al.  Editing training data for multi-label classification with the k-nearest neighbor rule , 2016, Pattern Analysis and Applications.

[6]  Francisco Herrera,et al.  A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..

[7]  J. Tolvi,et al.  Genetic algorithms for outlier detection and variable selection in linear regression models , 2004, Soft Comput..

[8]  Juan José Rodríguez Diez,et al.  Instance selection of linear complexity for big data , 2016, Knowl. Based Syst..

[9]  Juan José Rodríguez Diez,et al.  Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning , 2018, Expert Syst. Appl..

[10]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[11]  Nicolás García-Pedrajas,et al.  A cooperative coevolutionary algorithm for instance selection for instance-based learning , 2010, Machine Learning.

[12]  Juan José del Coz,et al.  Binary relevance efficacy for multilabel classification , 2012, Progress in Artificial Intelligence.

[13]  Juan José Rodríguez Diez,et al.  Instance selection for regression: Adapting DROP , 2016, Neurocomputing.

[14]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[15]  Concha Bielza,et al.  A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[16]  Francisco Herrera,et al.  An Evolutionary Multiobjective Model and Instance Selection for Support Vector Machines With Pareto-Based Ensembles , 2017, IEEE Transactions on Evolutionary Computation.

[17]  Marcin Blachnik,et al.  Instance Selection with Neural Networks for Regression Problems , 2012, ICANN.

[18]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[20]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[21]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[22]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[23]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[24]  Francisco Herrera,et al.  A multi-objective evolutionary approach to training set selection for support vector machine , 2018, Knowl. Based Syst..

[25]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[26]  Álvar Arnaiz-González,et al.  MR-DIS: democratic instance selection for big data by MapReduce , 2017, Progress in Artificial Intelligence.

[27]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[28]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[29]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[30]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[31]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[32]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[33]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[34]  S. Džeroski,et al.  Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition , 2009 .

[35]  George Atia,et al.  Robust and Scalable Column/Row Sampling from Corrupted Big Data , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[36]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[37]  Javier Pérez-Rodríguez,et al.  A scalable approach to simultaneous evolutionary instance and feature selection , 2013, Inf. Sci..

[38]  Juan José Rodríguez Diez,et al.  Local sets for multi-label instance selection , 2018, Appl. Soft Comput..

[39]  Hugo Jair Escalante,et al.  MOPG: a multi-objective evolutionary algorithm for prototype generation , 2017, Pattern Analysis and Applications.

[40]  Francisco Charte,et al.  MLeNN: A First Approach to Heuristic Multilabel Undersampling , 2014, IDEAL.

[41]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[43]  Michela Antonelli,et al.  Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach , 2012, IEEE Transactions on Fuzzy Systems.

[44]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[45]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[46]  Guillermo Sapiro,et al.  Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery , 2012, NIPS.

[47]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[48]  Tapio Elomaa,et al.  Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[49]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[50]  Wali Khan Mashwani Enhanced versions of differential evolution: state-of-the-art survey , 2014, Int. J. Comput. Sci. Math..

[51]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Hua Xu,et al.  An improved NSGA-III procedure for evolutionary many-objective optimization , 2014, GECCO.

[53]  Antonio González Muñoz,et al.  Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective , 2015, Pattern Recognit..

[54]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..