Quick Mining of Isomorphic Exact Large Patterns from Large Graphs

The applications of the sub graph isomorphism search are growing with the growing number of areas that model their systems using graphs or networks. Specifically, many biological systems, such as protein interaction networks, molecular structures and protein contact maps, are modeled as graphs. The sub graph isomorphism search is concerned with finding all sub graphs that are isomorphic to a relevant query graph, the existence of such sub graphs can reflect on the characteristics of the modeled system. The most computationally expensive step in the search for isomorphic sub graphs is the backtracking algorithm that traverses the nodes of the target graph. In this paper, we propose a pruning approach that is inspired by the minimum remaining value heuristic that achieves greater scalability over large query and target graphs. Our testing on various biological networks shows that performance enhancement of our approach over existing state-of-the-art approaches varies between 6x and 53x.

[1]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[2]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[3]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[4]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[5]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[6]  Yves Deville,et al.  Solving subgraph isomorphism problems with constraint programming , 2009, Constraints.

[7]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Julian R. Ullmann,et al.  Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism , 2010, JEAL.

[10]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[11]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[13]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[14]  Christine Solnon,et al.  AllDifferent-based filtering for subgraph isomorphism , 2010, Artif. Intell..

[15]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[16]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[17]  Dennis Shasha,et al.  A subgraph isomorphism algorithm and its application to biochemical data , 2013, BMC Bioinformatics.

[18]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  J R Matthews,et al.  Structure and function of helix-loop-helix proteins. , 1994, Biochimica et biophysica acta.