Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs

Support vector machines (SVMs) are one of the most popular and powerful machine learning techniques, but suffer from a significant drawback of the high time and memory complexities of their training. This issue needs to be endured especially in the case of large and noisy datasets. In this paper, we propose a new adaptive memetic algorithm (PCA2MA) for selecting valuable SVM training data from the entire set. It helps improve the classifier score, and speeds up the classification process by decreasing the number of support vectors. In PCA2MA, a population of reduced training sets undergoes the evolution, which is complemented by the refinement procedures. We propose to exploit both a priori information about the training set-extracted using the data geometry analysis-and the knowledge attained dynamically during the PCA2MA execution to enhance the refined sets. Also, we introduce a new adaptation scheme to control the pivotal algorithm parameters on the fly, based on the current search state. Extensive experimental study performed on benchmark, real-world, and artificial datasets clearly confirms the efficacy and convergence capabilities of the proposed approach. We demonstrate that PCA2MA is highly competitive compared with other state-of-the-art techniques. HighlightsWe propose a new adaptive memetic algorithm to select SVM training data.We perform the PCA-based preprocessing to determine valuable training samples.We apply a new scheme to adapt the refined training set size and selection scheme.We evaluate the importance of particular components of our algorithm.We demonstrate the effectiveness and efficiency of the proposed memetic algorithm.

[1]  Jakub Nalepa,et al.  Adaptive memetic algorithm for minimizing distance in the vehicle routing problem with time windows , 2016, Soft Comput..

[2]  Magdalene Marinaki,et al.  An Island Memetic Differential Evolution Algorithm for the Feature Selection Problem , 2013, NICSO.

[3]  Zhi-Qiang Zeng,et al.  A geometric approach to train SVM on very large data sets , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[4]  Marek Pawelczyk,et al.  Controllability-oriented placement of actuators for active noise-vibration control of rectangular plates using a memetic algorithm , 2013 .

[5]  Carlos Santa Cruz,et al.  Hierarchical linear support vector machine , 2012, Pattern Recognit..

[6]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[7]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[8]  Yangyang Li,et al.  A hybrid memetic algorithm for global optimization , 2014, Neurocomputing.

[9]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[10]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[11]  Xinbo Gao,et al.  Chinese text location under complex background using Gabor filter and SVM , 2011, Neurocomputing.

[12]  Zbigniew Michalewicz,et al.  Parameter control in evolutionary algorithms , 1999, IEEE Trans. Evol. Comput..

[13]  Jakub Nalepa,et al.  Support Vector Machines Training Data Selection Using a Genetic Algorithm , 2012, SSPR/SPR.

[14]  Changyin Sun,et al.  Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data , 2015, Knowl. Based Syst..

[15]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[16]  Hsing-Kuo Kenneth Pao,et al.  An RSVM based two-teachers-one-student semi-supervised learning algorithm , 2012, Neural Networks.

[17]  Long Zhang,et al.  Material identification of loose particles in sealed electronic devices using PCA and SVM , 2015, Neurocomputing.

[18]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[19]  Frederico G. Guimarães,et al.  Analysis of Approximation-Based Memetic Algorithms for Engineering Optimization , 2010 .

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[22]  Kevin Kok Wai Wong,et al.  Classification of adaptive memetic algorithms: a comparative study , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Kuldip K. Paliwal,et al.  Fast principal component analysis using fixed-point algorithm , 2007, Pattern Recognit. Lett..

[24]  Jakub Nalepa,et al.  Dynamically Adaptive Genetic Algorithm to Select Training Data for SVMs , 2014, IBERAMIA.

[25]  Antônio de Pádua Braga,et al.  SVM-KM: speeding SVMs learning with a priori cluster selection and k-means , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[26]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[27]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[28]  Irwin King,et al.  Locating support vectors via /spl beta/-skeleton technique , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[29]  Juan Manuel Górriz,et al.  Early diagnosis of Alzheimer's disease based on partial least squares, principal component analysis and support vector machine using segmented MRI images , 2015, Neurocomputing.

[30]  Dong Han,et al.  A strategic flight conflict avoidance approach based on a memetic algorithm , 2014 .

[31]  Jianping Yin,et al.  Research on virus detection technique based on ensemble neural network and SVM , 2014, Neurocomputing.

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[34]  Jiang-She Zhang,et al.  Reducing examples to accelerate support vector regression , 2007, Pattern Recognit. Lett..

[35]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[36]  Defeng Wang,et al.  Selecting valuable training samples for SVMs via data structure analysis , 2008, Neurocomputing.

[37]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[38]  Jakub Nalepa,et al.  Co-operation in the Parallel Memetic Algorithm , 2014, International Journal of Parallel Programming.

[39]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[40]  Atul Negi,et al.  Computational and space complexity analysis of SubXPCA , 2013, Pattern Recognit..

[41]  Jason A. Laska,et al.  Randomized Sampling for Large Data Applications of SVM , 2012, 2012 11th International Conference on Machine Learning and Applications.

[42]  Jakub Nalepa,et al.  A memetic algorithm to select training data for support vector machines , 2014, GECCO.

[43]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[44]  Yuh-Jye Lee,et al.  Variant Methods of Reduced Set Selection for Reduced Support Vector Machines , 2010, J. Inf. Sci. Eng..

[45]  Samia Boukir,et al.  Fast data selection for SVM training using ensemble margin , 2015, Pattern Recognit. Lett..

[46]  Jakub Nalepa,et al.  Adaptive Genetic Algorithm to Select Training Data for Support Vector Machines , 2014, EvoApplications.

[47]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[48]  David R. Musicant,et al.  Active set support vector regression , 2004, IEEE Transactions on Neural Networks.

[49]  Sungzoon Cho,et al.  Neighborhood PropertyBased Pattern Selection for Support Vector Machines , 2007, Neural Computation.

[50]  Jin-Kao Hao,et al.  A memetic algorithm for the Minimum Sum Coloring Problem , 2013, Comput. Oper. Res..

[51]  Shigeo Abe,et al.  Fast Training of Support Vector Machines by Extracting Boundary Data , 2001, ICANN.

[52]  Jin-Kao Hao,et al.  A memetic algorithm for discovering negative correlation biclusters of DNA microarray data , 2014, Neurocomputing.

[53]  Yihong Gong,et al.  Training mixture of weighted SVM for object detection using EM algorithm , 2015, Neurocomputing.

[54]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[55]  Wenjian Wang,et al.  A heuristic training for support vector regression , 2004, Neurocomputing.

[56]  Krzysztof Siminski Neuro-Fuzzy System Based Kernel for Classification with Support Vector Machines , 2013, ICMMI.