On Kernel-Target Alignment

We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction.

[1]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[2]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[3]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[4]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[5]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[6]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[7]  D. Wolpert OFF-TRAINING SET ERROR AND A PRIORI DISTINCTIONS BETWEEN LEARNING ALGORITHMS , 1994 .

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Prabhakar Raghavan,et al.  Sparse matrix reordering schemes for browsing hypertext , 1996 .

[10]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[11]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[12]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[13]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[16]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[17]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[18]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[19]  Hans Ulrich Simon,et al.  Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces , 2001, COLT/EuroCOLT.

[20]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[21]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[22]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .