Convolution kernels on discrete structures

We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on a innnite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to deene kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pair-HMMs, or ANOVA de-compositions. Uses of the method lead to open problems involving the theory of innnitely divisible positive deenite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.

[1]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[2]  G. Wahba Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized Gacv 1 1 Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized Gacv , 1998 .

[3]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[4]  László Máté Hilbert Space Methods in Science and Engineering , 1990 .

[5]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[6]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part II , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[12]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[13]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[14]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[15]  C Fitzgerald On fractional Hadamard powers of positive definite matrices*1, *2 , 1977 .

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[18]  N Linial,et al.  Global self-organization of all known protein sequences reveals inherent biological signatures. , 1997, Journal of molecular biology.

[19]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[20]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[21]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[22]  Lon Rosen,et al.  Positive Powers of Positive Positive Definite Matrices , 1996, Canadian Journal of Mathematics - Journal Canadien de Mathematiques.

[23]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[24]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[25]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[26]  Tomaso A. Poggio,et al.  A Sparse Representation for Function Approximation , 1998, Neural Computation.

[27]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[28]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[29]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .