Learning the Kernel Matrix with Semidefinite Programming

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive semidefinite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space---classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm---using the labeled part of the data one can learn an embedding also for the unlabeled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Furthermore, this approach leads directly to a convex method for learning the 2-norm soft margin parameter in support vector machines, solving an important open problem.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[4]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[5]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[6]  L. Breiman Arcing Classifiers , 1998 .

[7]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[8]  Stephen P. Boyd,et al.  Determinant Maximization with Linear Matrix Inequality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[13]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[14]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[15]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[16]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[17]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[18]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[19]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[20]  Gert R. G. Lanckriet,et al.  Convex Tuning of the Soft Margin Parameter , 2003 .

[21]  Thomas Hofmann,et al.  Text categorization by boosting automatically extracted concepts , 2003, SIGIR.

[22]  D. Madigan,et al.  Sparse Bayesian Classifiers for Text Categorization , 2003 .

[23]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[24]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[25]  Yan Huang Support Vector Machines for Text Categorization Based on Latent Semantic Indexing , 2003 .

[26]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.