Probabilistic Distance Measures in Reproducing Kernel Hibert Space

Probabilistic distance measures are important quantities in many research areas. For example, the Chernoff distance (or the Bhattacharyya distance as its special example) is often used to bound the Bayes error in a pattern classification task and the Kullback-Leibler (KL) distance is a key quantity in the information theory literature. However, computing these distances is a difficult task and analytic solutions are not available except under some special circumstances. One popular example is the Gaussian density. The Gaussian density employs only up to second-order statistics and hence is rather limited. In this paper, we enhance this capacity through a nonlinear mapping from original data space to reproducing kernel Hilbert space, which is implemented by a kernel embedding. Since this mapping is nonlinear, we present a new approach to study these distances whose feasibility and efficiency are demonstrated using experiments.

[1]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[2]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[3]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[4]  Lior Wolf,et al.  Kernel principal angles for classification machines with applications to image sequence interpretation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[6]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[7]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[8]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[9]  K. Matusita Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation , 1955 .

[10]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[11]  R. Chellappa,et al.  Probabilistic Analysis of Kernel Principal Components , 2004 .

[12]  Michael I. Jordan,et al.  Learning Graphical Models with Mercer Kernels , 2002, NIPS.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Gene H. Golub,et al.  Matrix computations , 1983 .

[17]  Tony Jebara,et al.  Bhattacharyya Expected Likelihood Kernels , 2003, COLT.

[18]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[19]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[20]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[21]  P. Mahalanobis On the generalized distance in statistics , 1936 .