Dynamically Adaptive Genetic Algorithm to Select Training Data for SVMs

This paper addresses an important problem of training set selection for support vector machines (SVMs). It is a critical step in case of large and noisy data sets due to high time and memory complexity of the SVM training. There have been several methods proposed so far, in majority underpinned with the analysis of data geometry either in the input or kernel space. Here, we propose a new dynamically adaptive genetic algorithm (DAGA) to select valuable training sets. We demonstrate that not only can DAGA quickly select the training data, but in addition it dynamically determines the desired training set size without any prior information. We analyze the impact of the support vectors ratio, defined as the percentage of support vectors in the training set, on the DAGA performance. Also, we investigate and discuss the possibility of incorporating reduced SVMs into the proposed algorithm. Extensive experimental study shows that DAGA offers fast and effective training set optimization that is independent on the entire training set size.

[1]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[2]  Zhi-Qiang Zeng,et al.  A geometric approach to train SVM on very large data sets , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[3]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[4]  Abdesselam Bouzerdoum,et al.  Adaptive skin segmentation in color images , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Jakub Nalepa,et al.  Adaptive Genetic Algorithm to Select Training Data for Support Vector Machines , 2014, EvoApplications.

[6]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Asdrúbal López Chau,et al.  Convex-Concave Hull for Classification with Support Vector Machine , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[9]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[10]  Jakub Nalepa,et al.  A memetic algorithm to select training data for support vector machines , 2014, GECCO.

[11]  Chih-Jen Lin,et al.  A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[12]  Leon N. Cooper,et al.  Training Data Selection for Support Vector Machines , 2005, ICNC.

[13]  Simi nacute,et al.  Transformation of Input Domain for SVM in Regression Task , 2014 .

[14]  Shigeo Abe,et al.  Fast Training of Support Vector Machines by Extracting Boundary Data , 2001, ICANN.

[15]  Edwin R. Hancock,et al.  Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop, SSPR&SPR 2010, Cesme, Izmir, Turkey, August 18-20, 2010. Proceedings , 2010, SSPR/SPR.

[16]  Kurt Hornik,et al.  Artificial Neural Networks — ICANN 2001 , 2001, Lecture Notes in Computer Science.

[17]  Yuh-Jye Lee,et al.  Variant Methods of Reduced Set Selection for Reduced Support Vector Machines , 2010, J. Inf. Sci. Eng..

[18]  Krzysztof Siminski Transformation of Input Domain for SVM in Regression Task , 2013, ICMMI.

[19]  Jakub Nalepa,et al.  New Selection Schemes in a Memetic Algorithm for the Vehicle Routing Problem with Time Windows , 2013, ICANNGA.

[20]  Irwin King,et al.  Locating support vectors via /spl beta/-skeleton technique , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[21]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[22]  Jakub Nalepa,et al.  Support Vector Machines Training Data Selection Using a Genetic Algorithm , 2012, SSPR/SPR.

[23]  Carlos Santa Cruz,et al.  Hierarchical linear support vector machine , 2012, Pattern Recognit..

[24]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[25]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[26]  Sungzoon Cho,et al.  Neighborhood PropertyBased Pattern Selection for Support Vector Machines , 2007, Neural Computation.

[27]  Jason A. Laska,et al.  Randomized Sampling for Large Data Applications of SVM , 2012, 2012 11th International Conference on Machine Learning and Applications.

[28]  Defeng Wang,et al.  Selecting valuable training samples for SVMs via data structure analysis , 2008, Neurocomputing.

[29]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[30]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[31]  Pau-Choo Chung,et al.  Naked image detection based on adaptive and extensible skin color model , 2007, Pattern Recognit..

[32]  Alicja Wakulicz-Deja,et al.  Man-Machine Interactions 3, Proceedings of the 3rd International Conference on Man-Machine Interactions, ICMMI 2013, Brenna, Poland, October 22-25, 2013 , 2014, ICMMI.

[33]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[34]  Hsing-Kuo Kenneth Pao,et al.  An RSVM based two-teachers-one-student semi-supervised learning algorithm , 2012, Neural Networks.

[35]  Jan Wessnitzer,et al.  A Model of Non-elemental Associative Learning in the Mushroom Body Neuropil of the Insect Brain , 2007, ICANNGA.

[36]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .