A Family of Probabilistic Kernels Based on Information Divergence

Probabilistic kernels offer a way to combine generative models with discriminative classifiers. We establish connections between probabilistic kernels and feature space kernels through a geometric interpretation of the previously proposed probability product kernel. A family of probabilistic kernels, based on information divergence measures, is then introduced and its connections to various existing probabilistic kernels are analyzed. The new family is shown to provide a unifying framework for the study of various important questions in kernel theory and practice. We exploit this property to design a set of experiments that yield interesting results regarding the role of properties such as linearity, positive definiteness, and the triangle inequality in kernel performance. Author email: abchan@ucsd.edu c ©University of California San Diego, 2004 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of the Statistical Visual Computing Laboratory of the University of California, San Diego; an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the University of California, San Diego. All rights reserved. SVCL Technical reports are available on the SVCL’s web page at http://www.svcl.ucsd.edu University of California, San Diego Statistical Visual Computing Laboratory 9500 Gilman Drive, Mail code 0407 EBU 1, Room 5512 La Jolla, CA 92093-0407

[1]  A. Rényi On Measures of Entropy and Information , 1961 .

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[4]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[7]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[8]  Nuno Vasconcelos,et al.  Feature Selection by Maximum Marginal Diversity , 2002, NIPS.

[9]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[10]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[11]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[12]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[13]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[14]  Bernard Haasdonk,et al.  Feature Space Interpretation of SVMs with non Positive Definite Kernels Internal Report 1 / 03 , 2003 .

[15]  Nuno Vasconcelos,et al.  The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition , 2004, ECCV.