A meta-learning approach to automatic kernel selection for support vector machines

Appropriate choice of a kernel is the most important ingredient of the kernel-based learning methods such as support vector machine (SVM). Automatic kernel selection is a key issue given the number of kernels available, and the current trial-and-error nature of selecting the best kernel for a given problem. This paper introduces a new method for automatic kernel selection, with empirical results based on classification. The empirical study has been conducted among five kernels with 112 different classification problems, using the popular kernel based statistical learning algorithm SVM. We evaluate the kernels’ performance in terms of accuracy measures. We then focus on answering the question: which kernel is best suited to which type of classification problem? Our meta-learning methodology involves measuring the problem characteristics using classical, distance and distribution-based statistical information. We then combine these measures with the empirical results to present a rule-based method to select the most appropriate kernel for a classification problem. The rules are generated by the decision tree algorithm C5.0 and are evaluated with 10 fold cross validation. All generated rules offer high accuracy ratings.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[3]  W. Mendenhall,et al.  Statistics for engineering and the sciences , 1984 .

[4]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[5]  Hyun-Chul Kim,et al.  Pattern classification using support vector machine ensemble , 2002, Object recognition supported by user interaction for service robots.

[6]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[9]  Ajith Abraham,et al.  Hybrid information systems , 2002 .

[10]  W. Meredith,et al.  Statistics and Data Analysis , 1974 .

[11]  Martha Larson,et al.  SVM Classification Using Sequences of Phonemes and Syllables , 2002, PKDD.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Steve Renals,et al.  Evaluation of kernel methods for speaker verification and identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  J. D. Jobson,et al.  Categorical and multivariate methods , 1992 .

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  Gunnar Rätsch,et al.  Learning to Predict the Leave-One-Out Error of Kernel Based Classifiers , 2001, ICANN.

[17]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[18]  Victor Ciesielski,et al.  Matching Data Mining Algorithm Suitability to Data Characteristics Using a Self-Organizing Map , 2001, HIS.

[19]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[20]  KATE A. SMITH,et al.  Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks , 2001 .

[21]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[22]  Aníbal R. Figueiras-Vidal,et al.  Growing support vector classifiers with controlled complexity , 2003, Pattern Recognit..

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Pablo Navarrete,et al.  Kernel-based Face Recognition by a Reformulation of Kernel Machines , 2003 .

[26]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[27]  Robert P. W. Duin,et al.  A note on comparing classifiers , 1996, Pattern Recognit. Lett..

[28]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Franklin A. Graybill,et al.  Introduction to The theory , 1974 .

[31]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[32]  Rajkumar Roy,et al.  Advances in Soft Computing: Engineering Design and Manufacturing , 1998 .

[33]  F. Kianifard Applied Multivariate Data Analysis: Volume II: Categorical and Multivariate Methods , 1994 .

[34]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..