A convex hull-based data selection method for data driven models

Display Omitted For data driven models, design data should cover the whole data range.Convex hull algorithms can be applied as a method for data selection.A randomized approximation convex hull algorithm, ApproxHull, is proposed.ApproxHull can be used for high dimensions, in an acceptable execution time, and with low memory requirements.ApproxHull improves the performance of classification and regression models. The accuracy of classification and regression tasks based on data driven models, such as Neural Networks or Support Vector Machines, relies to a good extent on selecting proper data for designing these models, covering the whole input range in which they will be employed. The convex hull algorithm can be applied as a method for data selection; however the use of conventional implementations of this method in high dimensions, due to its high complexity, is not feasible. In this paper, we propose a randomized approximation convex hull algorithm which can be used for high dimensions in an acceptable execution time, and with low memory requirements. Simulation results show that data selection by the proposed algorithm (coined as ApproxHull) can improve the performance of classification and regression models, in comparison with random data selection.

[1]  Jakub Nalepa,et al.  A memetic algorithm to select training data for support vector machines , 2014, GECCO.

[2]  James C. Bezdek,et al.  Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[3]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[4]  Zbigniew Michalewicz,et al.  Evolutionary Computation 1 , 2018 .

[5]  Chidchanok Lursinsap,et al.  A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm , 2005, Pattern Recognit. Lett..

[6]  Raimund Seidel,et al.  The Upper Bound Theorem for Polytopes: an Easy Proof of Its Asymptotic Version , 1995, Comput. Geom..

[7]  Francesc J. Ferri,et al.  An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering , 2002, Pattern Recognit..

[8]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[9]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[10]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[11]  Cor J. Veenman,et al.  The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Viera Stopjaková,et al.  PCA data preprocessing for neural network-based detection of parametric defects in analog IC , 2006, 2006 IEEE Design and Diagnostics of Electronic Circuits and systems.

[13]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[14]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[15]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[16]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[17]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[18]  Hongbin Zhang,et al.  Optimal reference subset selection for nearest neighbor classification by tabu search , 2002, Pattern Recognit..

[19]  José Francisco Martínez Trinidad,et al.  Mixed Data Object Selection Based on Clustering and Border Objects , 2007, CIARP.

[20]  Ronald L. Graham,et al.  An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set , 1972, Inf. Process. Lett..

[21]  Asdrúbal López Chau,et al.  Large data sets classification using convex–concave hull and support vector machine , 2012, Soft Computing.

[22]  Asdrúbal López Chau,et al.  Data Selection Using Decision Tree for SVM Classification , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[23]  Ludmila I. Kuncheva,et al.  Fitness functions in editing k-NN reference set by genetic algorithms , 1997, Pattern Recognit..

[24]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[25]  M. G. Ruano,et al.  A Radial Basis Function classifier for the automatic diagnosis of Cerebral Vascular Accidents , 2016, 2016 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges (GMEPE/PAHCE).

[26]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[27]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[28]  António E. Ruano,et al.  Online Sliding-Window Methods for Process Model Adaptation , 2009, IEEE Transactions on Instrumentation and Measurement.

[29]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[30]  László T. Kóczy,et al.  Supervised training algorithms for B-Spline neural networks and neuro-fuzzy systems , 2002, Int. J. Syst. Sci..

[31]  Horst Bunke,et al.  Transforming Strings to Vector Spaces Using Prototype Selection , 2006, SSPR/SPR.

[32]  José Francisco Martínez Trinidad,et al.  Restricted Sequential Floating Search Applied to Object Selection , 2007, MLDM.

[33]  Peter J. Fleming,et al.  Nonlinear identification of aircraft gas-turbine dynamics , 2003, Neurocomputing.

[34]  Kenneth L. Clarkson,et al.  Applications of random sampling in computational geometry, II , 1988, SCG '88.

[35]  Carlos M. Fonseca,et al.  An overview of nonlinear identification and control with neural networks. , 2005 .

[36]  M. G. Ruano,et al.  MOGA design for neural networks based system for automatic diagnosis of Cerebral Vascular Accidents , 2015, 2015 IEEE 9th International Symposium on Intelligent Signal Processing (WISP) Proceedings.

[37]  Pedro M. Ferreira,et al.  A simple algorithm for convex hull determination in high dimensions , 2013, 2013 IEEE 8th International Symposium on Intelligent Signal Processing.

[38]  Loris Nanni,et al.  A clustering method for automatic biometric template selection , 2006, Pattern Recognit..

[39]  Franco P. Preparata,et al.  Approximation algorithms for convex hulls , 1982, CACM.

[40]  Chien-Hsing Chou,et al.  The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[41]  Francesc J. Ferri,et al.  Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[42]  Min Wang,et al.  Online Support Vector Machine Based on Convex Hull Vertices Selection , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[43]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[44]  José Francisco Martínez Trinidad,et al.  Sequential Search for Decremental Edition , 2005, IDEAL.