Learning SVM with Complex Multiple Kernels Evolved by Genetic Programming

Classic kernel-based classifiers use only a single kernel, but the real-world applications have emphasized the need to consider a combination of kernels — also known as a multiple kernel (MK) — in order to boost the classification accuracy by adapting better to the characteristics of the data. Our purpose is to automatically design a complex multiple kernel by evolutionary means. In order to achieve this purpose we propose a hybrid model that combines a Genetic Programming (GP) algorithm and a kernel-based Support Vector Machine (SVM) classifier. In our model, each GP chromosome is a tree that encodes the mathematical expression of a multiple kernel. The evolutionary search process of the optimal MK is guided by the fitness function (or efficiency) of each possible MK. The complex multiple kernels which are evolved in this manner (eCMKs) are compared to several classic simple kernels (SKs), to a convex linear multiple kernel (cLMK) and to an evolutionary linear multiple kernel (eLMK) on several real-world data sets from UCI repository. The numerical experiments show that the SVM involving the evolutionary complex multiple kernels perform better than the classic simple kernels. Moreover, on the considered data sets, the new multiple kernels outperform both the cLMK and eLMK — linear multiple kernels. These results emphasize the fact that the SVM algorithm requires a combination of kernels more complex than a linear one in order to boost its performance.

[1]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[2]  Michael G. Madden,et al.  An Evolutionary Approach to Automatic Kernel Construction , 2006, ICANN.

[3]  Alan Piszcz,et al.  Genetic Programming: Analysis of Optimal Mutation Rates in a Problem with Varying Difficulty , 2006, FLAIRS.

[4]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[5]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[6]  Peter Nordin,et al.  The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming Using Sparse Data Sets , 1996, PPSN.

[7]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[8]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[9]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[10]  Olivier Chapelle,et al.  Support Vector Machines: Induction Principle, Adaptive Tuning and Prior Knowledge , 2002 .

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[13]  Sung-Do Chi,et al.  Evolutionary Parameter Estimation Algorithm for Combined Kernel Function in Support Vector Machine , 2004, AWCC.

[14]  P. Angeline An Investigation into the Sensitivity of Genetic Programming to the Frequency of Leaf Selection Duri , 1996 .

[15]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[16]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[17]  Thomas Bäck,et al.  Parallel Optimization of Evolutionary Algorithms , 1994, PPSN.

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Bernard Manderick,et al.  Appropriate Kernel Functions for Support Vector Machine Learning with Sequences of Symbolic Data , 2004, Deterministic and Statistical Methods in Machine Learning.

[21]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[24]  Michael G. Madden,et al.  The Genetic Kernel Support Vector Machine: Description and Evaluation , 2005, Artificial Intelligence Review.

[25]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[26]  W. Langdon,et al.  Genetic Programming with One-Point Crossover , 1998 .

[27]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[28]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[29]  Gang Wang,et al.  A kernel path algorithm for support vector machines , 2007, ICML '07.

[30]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[31]  Ha-Nam Nguyen,et al.  Combined Kernel Function for Support Vector Machine and Learning Method Based on Evolutionary Algorithm , 2004, ICONIP.

[32]  Gilbert Syswerda,et al.  A Study of Reproduction in Generational and Steady State Genetic Algorithms , 1990, FOGA.

[33]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[34]  Dong Seong Kim,et al.  Determining Optimal Decision Model for Support Vector Machine by Genetic Algorithm , 2004, CIS.

[35]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[36]  Sven F. Crone,et al.  Genetically Constructed Kernels for Support Vector Machines , 2005, OR.

[37]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[38]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[39]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[40]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[41]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..