Support Vector Machinery for Infinite Ensemble Learning

Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on the support vector machine (SVM). The framework can output an infinite and nonsparse ensemble through embedding infinitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies infinitely many decision stumps, and the perceptron kernel embodies infinitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies infinitely many decision trees, and can thus be explained through infinite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the benefit of faster parameter selection. These properties make the novel kernels favorable choices in practice.

[1]  F. Fleuret,et al.  Scale-Invariance of Support Vector Machines based on the Triangular Kernel , 2001 .

[2]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[4]  John Shawe-Taylor,et al.  PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[9]  Nozha Boujemaa,et al.  Generalized histogram intersection kernel for image recognition , 2005, IEEE International Conference on Image Processing 2005.

[10]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[11]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[12]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[13]  Ling Li,et al.  Infinite Ensemble Learning with Support Vector Machines , 2005, ECML.

[14]  Ji Zhu,et al.  l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.

[15]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[16]  Hsuan-Tien Lin,et al.  Analysis of SAGE Results with Combined Learning Techniques , 2005 .

[17]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[18]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[19]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[20]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[21]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[22]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[23]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[24]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[25]  Ling Li,et al.  Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent , 2007, 2007 International Joint Conference on Neural Networks.

[26]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[27]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[28]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Chih-Jen Lin,et al.  Manuscript Number: 2187 Training ν-Support Vector Classifiers: Theory and Algorithms , 2022 .

[30]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[31]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[32]  Nathan Srebro,et al.  ` 1 Regularization in Infinite Dimensional Feature Spaces , 2007 .

[33]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[34]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[35]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[36]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[37]  Hsuan-Tien Lin,et al.  Novel Distance-Based SVM Kernels for Infinite Ensemble Learning , 2005 .

[38]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[39]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[40]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[41]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[42]  B. Baxter,et al.  Conditionally positive functions andp-norm distance matrices , 1991, 1006.2449.