Column-generation boosting methods for mixture of kernels

We devise a boosting approach to classification and regression based on column generation using a mixture of kernels. Traditional kernel methods construct models based on a single positive semi-definite kernel with the type of kernel predefined and kernel parameters chosen according to cross-validation performance. Our approach creates models that are mixtures of a library of kernel models, and our algorithm automatically determines kernels to be used in the final model. The 1-norm and 2-norm regularization methods are employed to restrict the ensemble of kernel models. The proposed method produces sparser solutions, and thus significantly reduces the testing time. By extending the column generation (CG) optimization which existed for linear programs with 1-norm regularization to quadratic programs with 2-norm regularization, we are able to solve many learning formulations by leveraging various algorithms for constructing single kernel models. By giving different priorities to columns to be generated, we are able to scale CG boosting to large datasets. Experimental results on benchmark data are included to demonstrate its effectiveness.

[1]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[2]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[3]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Simon Haykin,et al.  Generalized support vector machines , 1999, ESANN.

[7]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[10]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[11]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[12]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[13]  I. Mora-Jiménez,et al.  On problem-oriented kernel refining , 2003, Neurocomputing.

[14]  Aníbal R. Figueiras-Vidal,et al.  Growing support vector classifiers with controlled complexity , 2003, Pattern Recognit..

[15]  J. Suykens,et al.  Ensemble Learning of Coupled Parmeterised Kernel Models , 2003 .

[16]  Jinbo Bi Multi-Objective Programming in SVMs , 2003, ICML.

[17]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[18]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[19]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[20]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[21]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[22]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .