Memetic Evolution of Training Sets with Adaptive Radial Basis Kernels for Support Vector Machines

Support vector machines (SVMs) are a supervised learning technique that can be applied in both binary and multi-class classification and regression tasks. SVMs seamlessly handle continuous and categorical variables. Their training is, however, both time- and memory-costly for large training data, and selecting an incorrect kernel function or its hyperparameters leads to suboptimal decision hyperplanes. In this paper, we introduce a memetic algorithm for evolving SVM training sets with adaptive radial basis function kernels to not only make the deployment of SVMs easier for emerging big data applications, but also to improve their generalization abilities over the unseen data. We build upon two observations: first, only a small subset of all training vectors, called the support vectors, contribute to the position of the decision boundary, hence the other vectors can be removed from the training set without deteriorating the performance of the model. Second, selecting different kernel hyperparameters for different training vectors may help better reflect the subtle characteristics of the space while determining the hyperplane. The experiments over almost 100 benchmark and synthetic sets showed that our algorithm delivers models outperforming both SVMs optimized using state-of-the-art evolutionary techniques, and other supervised learners.

[1]  Oguz Bayat,et al.  A grasshopper optimizer approach for feature selection and optimizing SVM parameters utilizing real biomedical data sets , 2019, Neural Computing and Applications.

[2]  Yanhui Guo,et al.  Hyperspectral image classification with SVM and guided filter , 2019, EURASIP Journal on Wireless Communications and Networking.

[3]  Cheng Li,et al.  Stable clinical prediction using graph support vector machines , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[4]  Hatem A. Fayed,et al.  Speed up grid-search for parameter selection of support vector machines , 2019, Appl. Soft Comput..

[5]  Samia Boukir,et al.  Fast data selection for SVM training using ensemble margin , 2015, Pattern Recognit. Lett..

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[8]  Antônio de Pádua Braga,et al.  Width optimization of RBF kernels for binary classification of support vector machines: A density estimation-based approach , 2019, Pattern Recognit. Lett..

[9]  Lei Zhang,et al.  A subregion division based multi-objective evolutionary algorithm for SVM training set selection , 2020, Neurocomputing.

[10]  Mahardhika Pratama,et al.  Financial time series forecasting using twin support vector regression , 2019, PloS one.

[11]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[12]  Jakub Nalepa,et al.  Towards parameter-less support vector machines , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[13]  Wenyong Wang,et al.  An efficient instance selection algorithm to reconstruct training set for support vector machine , 2017, Knowl. Based Syst..

[14]  Amalia Luque,et al.  The impact of class imbalance in classification performance metrics based on the binary confusion matrix , 2019, Pattern Recognit..

[15]  Minho Lee,et al.  Deep learning of support vector machines with class probability output networks , 2015, Neural Networks.

[16]  Julio López,et al.  Simultaneous feature selection and heterogeneity control for SVM classification: An application to mental workload assessment , 2020, Expert Syst. Appl..

[17]  Jakub Nalepa,et al.  Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs , 2016, Neurocomputing.

[18]  Osamu Watanabe,et al.  A Random Sampling Technique for Training Support Vector Machines , 2001, ALT.

[19]  Annabella Astorino,et al.  Scaling Up Support Vector Machines Using Nearest Neighbor Condensation , 2010, IEEE Transactions on Neural Networks.

[20]  Jakub Nalepa,et al.  Selecting training sets for support vector machines: a review , 2018, Artificial Intelligence Review.

[21]  Chien-Shun Lo,et al.  Support vector machine for breast MR image classification , 2012, Comput. Math. Appl..

[22]  Jakub Nalepa,et al.  A memetic algorithm to select training data for support vector machines , 2014, GECCO.

[23]  Ausif Mahmood,et al.  Review of Deep Learning Algorithms and Architectures , 2019, IEEE Access.

[24]  Yong Xia,et al.  GA-SVM based feature selection and parameter optimization in hospitalization expense modeling , 2019, Appl. Soft Comput..

[25]  Jinglu Hu,et al.  A Segmented Local Offset Method for Imbalanced Data Classification Using Quasi-Linear Support Vector Machine , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[26]  Yang Wang,et al.  Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. , 2018, Cancer genomics & proteomics.

[27]  Marek Pawelczyk,et al.  Shaping zones of quiet in a large enclosure generated by an active noise control system , 2018 .

[28]  Meng Zhao,et al.  Very large-scale data classification based on K-means clustering and multi-kernel SVM , 2019, Soft Comput..

[29]  D. Chicco,et al.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation , 2020, BMC Genomics.