Graph Kernels from the Jensen-Shannon Divergence

Graph-based representations have been proved powerful in computer vision. The challenge that arises with large amounts of graph data is that of computationally burdensome edit distance computation. Graph kernels can be used to formulate efficient algorithms to deal with high dimensional data, and have been proved an elegant way to overcome this computational bottleneck. In this paper, we investigate whether the Jensen-Shannon divergence can be used as a means of establishing a graph kernel. The Jensen-Shannon kernel is nonextensive information theoretic kernel, and is defined using the entropy and mutual information computed from probability distributions over the structures being compared. To establish a Jensen-Shannon graph kernel, we explore two different approaches. The first of these is based on the von Neumann entropy associated with a graph. The second approach uses the Shannon entropy associated with the probability state vector for a steady state random walk on a graph. We compare the two resulting graph kernels for the problem of graph clustering. We use kernel principle components analysis (kPCA) to embed graphs into a feature space. Experimental results reveal that the method gives good classification results on graphs extracted both from an object recognition database and from an application in bioinformation.

[1]  Edwin R. Hancock,et al.  Clustering and Embedding Using Commute Times , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Gordon F. Royle,et al.  Algebraic Graph Theory , 2001, Graduate texts in mathematics.

[3]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[4]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[5]  Lawrence B. Holder,et al.  Faster computation of the direct product kernel for graph classification , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[6]  Edwin R. Hancock,et al.  Characterizing Graphs Using Approximate von Neumann Entropy , 2011, IbPRIA.

[7]  Edwin R. Hancock,et al.  Graph matching using the interference of continuous-time quantum walks , 2009, Pattern Recognit..

[8]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[9]  G. Bianconi,et al.  Shannon and von Neumann entropy of random networks with heterogeneous expected degree. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Edwin R. Hancock,et al.  A Riemannian approach to graph embedding , 2007, Pattern Recognit..

[11]  S. Severini,et al.  The Laplacian of a Graph as a Density Matrix: A Basic Combinatorial Approach to Separability of Mixed States , 2004, quant-ph/0406165.

[12]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[13]  A. Plastino,et al.  JENSEN–SHANNON DIVERGENCE AS A MEASURE OF THE DEGREE OF ENTANGLEMENT , 2008, 0804.3662.

[14]  J. Crutchfield,et al.  Measures of statistical complexity: Why? , 1998 .

[15]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[16]  C. Tsallis,et al.  Nonextensive Entropy: Interdisciplinary Applications , 2004 .

[17]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  William J. Fitzgerald,et al.  Density Kernels on Unordered Sets for Kernel-Based Signal Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[19]  Edwin R. Hancock,et al.  Graph Characteristics from the Ihara Zeta Function , 2008, SSPR/SPR.

[20]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[21]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[22]  Edwin R. Hancock,et al.  Graph embedding using tree edit-union , 2007, Pattern Recognit..

[23]  Edwin R. Hancock,et al.  Graph characteristics from the heat kernel trace , 2009, Pattern Recognit..

[24]  Pablo Suau,et al.  Bayesian optimization of the scale saliency filter , 2008, Image Vis. Comput..

[25]  Simone Severini,et al.  Quantifying Complexity in Networks: The von Neumann Entropy , 2009, Int. J. Agent Technol. Syst..

[26]  P. W. Lamberti,et al.  Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states , 2005, quant-ph/0508138.

[27]  Edwin R. Hancock,et al.  Pattern Vectors from Algebraic Graph Theory , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Edwin R. Hancock,et al.  Quantum walks, Ihara zeta functions and cospectrality in regular graphs , 2011, Quantum Inf. Process..

[29]  A. Plastino,et al.  Metric character of the quantum Jensen-Shannon divergence , 2008, 0801.1586.

[30]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[31]  Ernest Valveny,et al.  Generalized median graph computation by means of graph embedding in vector spaces , 2010, Pattern Recognit..

[32]  Edwin R. Hancock,et al.  Graph matching using the interference of discrete-time quantum walks , 2009, Image Vis. Comput..

[33]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[34]  Edwin R. Hancock,et al.  Discovering Shape Classes using Tree Edit-Distance and Pairwise Clustering , 2007, International Journal of Computer Vision.

[35]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[36]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.