Diffusion Kernels on Statistical Manifolds

A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.

[1]  Clarence E. Rose,et al.  What is tensor analysis? , 1938, Electrical Engineering.

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  M. Berger,et al.  Le Spectre d'une Variete Riemannienne , 1971 .

[4]  A. Dawid Further Comments on Some Comments on a Paper by Bradley Efron , 1977 .

[5]  S. Yau,et al.  Estimates of eigenvalues of a compact Riemannian manifold , 1980 .

[6]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[7]  甘利 俊一 Differential geometry in statistical inference , 1987 .

[8]  R. Kass The Geometry of Asymptotic Inference , 1989 .

[9]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[10]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[12]  S. Yau,et al.  Lectures on Differential Geometry , 1994 .

[13]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[14]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[15]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[16]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Construction of the Heat Kernel , 1997 .

[17]  A. Grigor’yan,et al.  The Heat Kernel on Hyperbolic Space , 1998 .

[18]  Alan Thornton Gous,et al.  Exponential and spherical subfamily models , 1998 .

[19]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[20]  John Shawe-Taylor,et al.  Covering numbers for support vector machines , 1999, COLT '99.

[21]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[22]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[23]  M. C. Chaki ON STATISTICAL MANIFOLDS , 2000 .

[24]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[25]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[26]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[27]  Thorsten Joachims,et al.  The Maximum-Margin Approach to Learning Text Classifiers , 2001, Künstliche Intell..

[28]  Mikhail Belkin,et al.  Using Manifold Stucture for Partially Labeled Classification , 2002, NIPS.

[29]  Mikhail Belkin,et al.  Using manifold structure for partially labelled classification , 2002, NIPS 2002.

[30]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[31]  John D. Lafferty,et al.  Information Diffusion Kernels , 2002, NIPS.

[32]  Shahar Mendelson,et al.  On the Performance of Kernel Classes , 2003, J. Mach. Learn. Res..

[33]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[34]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[35]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.