Two-Stage Learning Kernel Algorithms

This paper examines two-stage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignment-based techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definition of kernel alignment and significantly extend that work in several directions: we give a novel and simple concentration bound for alignment between kernel matrices; show the existence of good predictors for kernels with high alignment, both for classification and for regression; give algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP; and report the results of extensive experiments with this alignment-based method in classification and regression tasks, which show an improvement both over the uniform combination of kernels and over other state-of-the-art learning kernel methods.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[3]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[4]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[5]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[6]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[7]  Nello Cristianini,et al.  On the Extensions of Kernel Alignment , 2002 .

[8]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[9]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[10]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[11]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[12]  W. Pyle A Theory of Learning. , 1924 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Cédric Richard,et al.  Optimizing kernel alignment by data translation in feature space , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[16]  Shai Ben-David,et al.  Learning Bounds for Support Vector Machines with Learned Kernels , 2006, COLT.

[17]  Marina Meila,et al.  Data centering in feature space , 2003, AISTATS.

[18]  Corinna Cortes,et al.  Invited talk: Can learning kernels help performance? , 2009, International Conference on Machine Learning.

[19]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[20]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.