Adaptive Genetic Algorithm to Select Training Data for Support Vector Machines

This paper presents a new adaptive genetic algorithm (AGA) to select training data for support vector machines (SVMs). SVM training data selection strongly influences the classification accuracy and time, especially in the case of large and noisy data sets. In the proposed AGA, a population of solutions evolves with time. The AGA parameters, including the chromosome length, are adapted according to the current state of exploring the solution space. We propose a new multi-parent crossover operator for an efficient search. A new metric of distance between individuals is introduced and applied in the AGA. It is based on the fast analysis of the vectors distribution in the feature space obtained using principal component analysis. An extensive experimental study performed on the well-known benchmark sets along with the real-world and artificial data sets, confirms that the AGA outperforms a standard GA in terms of the convergence capabilities. Also, it reduces the number of support vectors and allows for faster SVM classification.

[1]  Abdesselam Bouzerdoum,et al.  Adaptive skin segmentation in color images , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Leon N. Cooper,et al.  Training Data Selection for Support Vector Machines , 2005, ICNC.

[3]  Zhi-Qiang Zeng,et al.  A geometric approach to train SVM on very large data sets , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[4]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[5]  Irwin King,et al.  Locating support vectors via /spl beta/-skeleton technique , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[6]  Jakub Nalepa,et al.  Support Vector Machines Training Data Selection Using a Genetic Algorithm , 2012, SSPR/SPR.

[7]  Defeng Wang,et al.  Selecting valuable training samples for SVMs via data structure analysis , 2008, Neurocomputing.

[8]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[11]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[12]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[13]  Sungzoon Cho,et al.  Neighborhood PropertyBased Pattern Selection for Support Vector Machines , 2007, Neural Computation.

[14]  Yuh-Jye Lee,et al.  Variant Methods of Reduced Set Selection for Reduced Support Vector Machines , 2010, J. Inf. Sci. Eng..

[15]  Hsing-Kuo Kenneth Pao,et al.  An RSVM based two-teachers-one-student semi-supervised learning algorithm , 2012, Neural Networks.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Riccardo Poli,et al.  New ideas in optimization , 1999 .

[18]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..