A memetic algorithm to select training data for support vector machines

In this paper we propose a new memetic algorithm (MASVM) for fast and efficient selection of a valuable training set for support vector machines (SVMs). This is a crucial step especially in case of large and noisy data sets, since the SVM training has high time and memory complexity. The majority of state-of-the-art methods exploit the data geometry analysis, both in the input and kernel space. Although evolutionary algorithms have been proven to be very efficient for this purpose, they have not been extensively studied so far. Here, we propose a new method employing an adaptive genetic algorithm enhanced by some refinement techniques. The refinements are based on utilizing a pool of the support vectors identified so far at various steps of the algorithm. Extensive experimental study performed on the well-known benchmark, real-world and artificial data sets clearly confirms the efficacy, robustness and convergence capabilities of the proposed approach, and shows that it is competitive compared with other state-of-the-art techniques.

[1]  Hsing-Kuo Kenneth Pao,et al.  An RSVM based two-teachers-one-student semi-supervised learning algorithm , 2012, Neural Networks.

[2]  Dong Han,et al.  A strategic flight conflict avoidance approach based on a memetic algorithm , 2014 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[5]  Leon N. Cooper,et al.  Training Data Selection for Support Vector Machines , 2005, ICNC.

[6]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[7]  Irwin King,et al.  Locating support vectors via /spl beta/-skeleton technique , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[8]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[9]  Defeng Wang,et al.  Selecting valuable training samples for SVMs via data structure analysis , 2008, Neurocomputing.

[10]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[11]  Jakub Nalepa,et al.  Support Vector Machines Training Data Selection Using a Genetic Algorithm , 2012, SSPR/SPR.

[12]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[13]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[14]  A. Dickson On Evolution , 1884, Science.

[15]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[16]  Magdalene Marinaki,et al.  An Island Memetic Differential Evolution Algorithm for the Feature Selection Problem , 2013, NICSO.

[17]  Carlos Santa Cruz,et al.  Hierarchical linear support vector machine , 2012, Pattern Recognit..

[18]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[19]  Sungzoon Cho,et al.  Neighborhood PropertyBased Pattern Selection for Support Vector Machines , 2007, Neural Computation.

[20]  Jakub Nalepa,et al.  New Selection Schemes in a Memetic Algorithm for the Vehicle Routing Problem with Time Windows , 2013, ICANNGA.

[21]  Asdrúbal López Chau,et al.  Convex-Concave Hull for Classification with Support Vector Machine , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[22]  Shigeo Abe,et al.  Fast Training of Support Vector Machines by Extracting Boundary Data , 2001, ICANN.

[23]  Zhi-Qiang Zeng,et al.  A geometric approach to train SVM on very large data sets , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[24]  Yangyang Li,et al.  Kernel clustering using a hybrid memetic algorithm , 2013, Natural Computing.

[25]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[26]  Abdesselam Bouzerdoum,et al.  Adaptive skin segmentation in color images , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  Jakub Nalepa,et al.  Adaptive Genetic Algorithm to Select Training Data for Support Vector Machines , 2014, EvoApplications.

[28]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[29]  Yuh-Jye Lee,et al.  Variant Methods of Reduced Set Selection for Reduced Support Vector Machines , 2010, J. Inf. Sci. Eng..

[30]  Konstantinos G. Margaritis,et al.  A Proposed Genetic Algorithm Selection Method , 2006 .

[31]  Jin-Kao Hao,et al.  A memetic algorithm for the Minimum Sum Coloring Problem , 2013, Comput. Oper. Res..

[32]  Jason A. Laska,et al.  Randomized Sampling for Large Data Applications of SVM , 2012, 2012 11th International Conference on Machine Learning and Applications.

[33]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[34]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.