GP-based kernel evolution for L2-Regularization Networks

In kernel-based learning methods, a crucial design parameter is given by the choice of the kernel function to be used. Although there is, in theory, an infinite range of potential candidates, a handful of kernels covers the majority of actual applications. Partly, this is due to the difficulty of choosing an optimal kernel function in absence of a-priori information. In this respect, Genetic Programming (GP) techniques have shown interesting capabilities of learning non-trivial kernel functions that outperform commonly used ones. However, experiments have been restricted to the use of Support Vector Machines (SVMs), and have not addressed some problems that are specific to GP implementations, such as diversity maintenance. In these respects, the aim of this paper is twofold. First, we present a customized GP-based kernel search method that we apply using an L2-Regularization Network as the base learning algorithm. Second, we investigate the problem of diversity maintenance in the context of kernel evolution, and test an adaptive criterion for maintaining it in our algorithm. For the former point, experiments show a gain in accuracy for our method against fine-tuned standard kernels. For the latter, we show that diversity is decreasing critically fast during the GP iterations, but this decrease does not seems to affect performance of the algorithm.

[1]  Graham Kendall,et al.  Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.

[2]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Bernd Bischl,et al.  Tuning and evolution of support vector kernels , 2012, Evol. Intell..

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Michael G. Madden,et al.  The Genetic Kernel Support Vector Machine: Description and Evaluation , 2005, Artificial Intelligence Review.

[6]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[8]  Anikó Ekárt,et al.  Maintaining the Diversity of Genetic Programs , 2002, EuroGP.

[9]  Sean Luke,et al.  Evolving kernels for support vector machine classification , 2007, GECCO '07.

[10]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[11]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[12]  Laura Diosan,et al.  Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters , 2010, Applied Intelligence.

[13]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[14]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[15]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Soft Margin Multiple Kernel Learning , 2022 .

[16]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.