Structure and attribute index for approximate graph matching in large graphs

The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query graph. We propose a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph. We first construct an index on the similarity of the attributed graph, by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Then, we construct a connectivity-based index to give a concise representation of inter-partition connections. We use the index to find a set of best matching paths. From these best matching paths, we compute the best matching answer graph using a greedy algorithm. Experimental results on real datasets demonstrate the efficiency of both index construction and query processing. We also show that our approach attains high-quality query answers.

[1]  Wilfred Ng,et al.  Efficient processing of group-oriented connection queries in a large graph , 2009, CIKM.

[2]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[3]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[4]  Wilfred Ng,et al.  Context-Aware Object Connection Discovery in Large Graphs , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[6]  Wilfred Ng,et al.  Correlation search in graph databases , 2007, KDD '07.

[7]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[8]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[9]  Jeffrey Xu Yu,et al.  Efficient Discovery of Frequent Correlated Subgraph Pairs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[10]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[11]  Jiawei Han,et al.  Mining Graph Patterns Efficiently via Randomized Summaries , 2009, Proc. VLDB Endow..

[12]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[13]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[14]  Wenfei Fan,et al.  Incremental evaluation of schema-directed XML publishing , 2004, SIGMOD '04.

[15]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[16]  Tsau Young Lin,et al.  Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[17]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[18]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[19]  Junjie Yao,et al.  Routing Questions to the Right Users in Online Communities , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[21]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[22]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[23]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[24]  Jeffrey Xu Yu,et al.  Top-k Correlative Graph Mining , 2009, SDM.

[25]  Alexander F. Gelbukh,et al.  Information Retrieval with Conceptual Graph Matching , 2000, DEXA.

[26]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[28]  King-Sun Fu,et al.  Error-Correcting Isomorphisms of Attributed Relational Graphs for Pattern Analysis , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[30]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[33]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[34]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[35]  Wilfred Ng,et al.  Efficient Correlation Search from Graph Databases , 2008, IEEE Transactions on Knowledge and Data Engineering.

[36]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[37]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[38]  Robert M. Haralick,et al.  Structural Descriptions and Inexact Matching , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[40]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[41]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[42]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[43]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[44]  Wilfred Ng,et al.  Efficient query processing on graph databases , 2009, TODS.

[45]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[46]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[47]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[48]  Anshul Gupta,et al.  Fast and effective algorithms for graph partitioning and sparse-matrix ordering , 1997, IBM J. Res. Dev..

[49]  Jeffrey Xu Yu,et al.  Querying Large Graph Databases , 2010, DASFAA.

[50]  Hideo Matsuda,et al.  Classifying Molecular Sequences Using a Linkage Graph With Their Pairwise Similarities , 1999, Theor. Comput. Sci..

[51]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.