Graph pattern matching

Graph pattern matching is typically defined in terms of subgraph isomorphism, which makes it an np-complete problem. Moreover, it requires bijective functions, which are often too restrictive to characterize patterns in emerging applications. We propose a class of graph patterns, in which an edge denotes the connectivity in a data graph within a predefined number of hops. In addition, we define matching based on a notion of bounded simulation, an extension of graph simulation. We show that with this revision, graph pattern matching can be performed in cubic-time, by providing such an algorithm. We also develop algorithms for incrementally finding matches when data graphs are updated, with performance guarantees for dag patterns. We experimentally verify that these algorithms scale well, and that the revised notion of graph pattern matching allows us to identify communities commonly found in real-world networks.

[1]  Hector Garcia-Molina,et al.  Finding replicated Web collections , 2000, SIGMOD '00.

[2]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[3]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[4]  John Jay,et al.  UNDERSTANDING THE STRUCTURE OF A DRUG TRAFFICKING ORGANIZATION : A CONVERSATIONAL ANALYSIS by Mangai Natarajan , 2006 .

[5]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[6]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Sihem Amer-Yahia,et al.  Challenges in Searching Online Communities , 2007, IEEE Data Eng. Bull..

[8]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[9]  Hao He,et al.  Incremental maintenance of XML structural indexes , 2004, SIGMOD '04.

[10]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[11]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[12]  Thomas W. Reps,et al.  A categorized bibliography on incremental computation , 1993, POPL '93.

[13]  Jianzhong Li,et al.  Graph homomorphism revisited for graph matching , 2010, Proc. VLDB Endow..

[14]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[15]  Thomas W. Reps,et al.  An Incremental Algorithm for a Generalization of the Shortest-Path Problem , 1996, J. Algorithms.

[16]  Thomas W. Reps,et al.  On the Computational Complexity of Dynamic Graph Problems , 1996, Theor. Comput. Sci..

[17]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[18]  Gregory Gutin,et al.  Digraphs - theory, algorithms and applications , 2002 .

[19]  Diptikalyan Saha An Incremental Bisimulation Algorithm , 2007, FSTTCS.

[20]  Hector Garcia-Molina,et al.  Finding replicated Web collections , 2000, SIGMOD 2000.

[21]  David W. McDonald,et al.  Social matching: A framework and research agenda , 2005, TCHI.

[22]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[23]  Philip S. Yu,et al.  Fast computing reachability labelings for large graphs with high compression rate , 2008, EDBT '08.

[24]  Wenfei Fan,et al.  Information preserving XML schema embedding , 2005, TODS.

[25]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[26]  Francesco Ranzato,et al.  The Subgraph Similarity Problem , 2009, IEEE Transactions on Knowledge and Data Engineering.

[27]  Oded Shmueli,et al.  SoQL: A Language for Querying and Creating Data in Social Networks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[28]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[29]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[30]  Brian Gallagher,et al.  Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching , 2006, AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.

[31]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Edward P. F. Chan,et al.  Optimization and evaluation of shortest path queries , 2007, The VLDB Journal.