Structure-based graph distance measures of high degree of precision

In recent years, evaluating graph distance has become more and more important in a variety of real applications and many graph distance measures have been proposed. Among all of those measures, structure-based graph distance measures have become the research focus due to their independence of the definition of cost functions. However, existing structure-based graph distance measures have low degree of precision because only node and edge information of graphs are employed in these measures. To improve the precision of graph distance measures, we define substructure abundance vector (SAV) to capture more substructure information of a graph. Furthermore, based on SAV, we propose unified graph distance measures which are generalization of the existing structure-based graph distance measures. In general, the unified graph distance measures can evaluate graph distance in much finer grain. We also show that unified graph distance measures based on occurrence mapping and some of their variants are metrics. Finally, we apply the unified graph distance metric and its variants to the population evolution analysis and construct distance graphs of marker networks in three populations, which reflect the single nucleotide polymorphism (SNP) linkage disequilibrium (LD) differences among these populations.

[1]  Dimitris K. Agrafiotis,et al.  Advances in diversity profiling and combinatorial series design , 2004, Molecular Diversity.

[2]  Horst Bunke,et al.  Self-organizing maps for learning the edit costs in graph matching , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Peter Willett,et al.  Promoting Access to White Rose Research Papers Effectiveness of Graph-based and Fingerprint-based Similarity Measures for Virtual Screening of 2d Chemical Structure Databases , 2022 .

[6]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[7]  Jenny Benois-Pineau,et al.  Retrieval of objects in video by similarity based on graph matching , 2007, Pattern Recognit. Lett..

[8]  Peter Willett,et al.  Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[9]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[10]  Miro Kraetzl,et al.  Graph distances using graph union , 2001, Pattern Recognit. Lett..

[11]  Horst Bunke,et al.  Automatic learning of cost functions for graph edit distance , 2007, Inf. Sci..

[12]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[13]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[14]  Mong-Li Lee,et al.  NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs , 2006, KDD '06.

[15]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[16]  Matthias Dehmer,et al.  Structural similarity of directed universal hierarchical graphs: A low computational complexity approach , 2007, Appl. Math. Comput..

[17]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[19]  Elio Masciari,et al.  Exploiting structural similarity for effective Web information extraction , 2007, Data Knowl. Eng..

[20]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[21]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[22]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[23]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[24]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[25]  D. Nickerson,et al.  Variation is the spice of life , 2001, Nature Genetics.

[26]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[27]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[28]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[29]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[30]  Philip S. Yu,et al.  Feature-based similarity search in graph structures , 2006, TODS.

[31]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Béla Bollobás,et al.  Modern Graph Theory , 2002, Graduate Texts in Mathematics.

[33]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[34]  K. Rohde,et al.  Entropy as a Measure for Linkage Disequilibrium over Multilocus Haplotype Blocks , 2003, Human Heredity.

[35]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[36]  Kaspar Riesen,et al.  Bipartite Graph Matching for Computing the Edit Distance of Graphs , 2007, GbRPR.

[37]  Horst Bunke,et al.  A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Matthias Dehmer,et al.  Comparing large graphs efficiently by margins of feature vectors , 2007, Appl. Math. Comput..

[39]  Chonghui Guo,et al.  Entropy optimization of scale-free networks’ robustness to random failures , 2005, cond-mat/0506725.

[40]  Matthias Dehmer,et al.  A similarity measure for graphs with low computational complexity , 2006, Appl. Math. Comput..

[41]  S YuPhilip,et al.  Feature-based similarity search in graph structures , 2006 .

[42]  Kevin Murphy,et al.  A brief introduction to graphical models and bayesian networks , 1998 .

[43]  Marcello Pelillo,et al.  Metrics For Attributed Graphs Based On The Maximal Similarity Common Subgraph , 2004, Int. J. Pattern Recognit. Artif. Intell..