Novel Distance-Based SVM Kernels for Infinite Ensemble Learning

Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of base hypotheses. However, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. It has recently been shown that the support vector machine (SVM) with a carefully crafted kernel can be used to construct a nonsparse ensemble of infinitely many hypotheses. Such infinite ensembles may surpass finite and/or sparse ensembles in learning performance and robustness. In this paper, we derive two novel kernels, the stump kernel and the perceptron kernel, for infinite ensemble learning. The stump kernel embodies an infinite number of decision stumps, and measures the similarity between examples by the '1-norm distance. The perceptron kernel embodies perceptrons, and works with the '2-norm distance. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with these kernels has similar performance to SVM with the Gaussian kernel, but enjoys the benefit of faster parameter selection. These properties make the kernels favorable choices in practice.

[1]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[2]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[10]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[11]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[12]  Chih-Jen Lin,et al.  Training ν-Support Vector Classifiers: Theory and Algorithms , 2001 .

[13]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[14]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[15]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[16]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[17]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[18]  Ling Li,et al.  Infinite Ensemble Learning with Support Vector Machines , 2005, ECML.

[19]  Ling Li,et al.  Perceptron learning with random coordinate descent , 2005 .