Capturing Topology in Graph Pattern Matching

Graph pattern matching is often defined in terms of subgraph isomorphism, an np-complete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubic-time. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubic-time algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using real-life data and synthetic data.

[1]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[2]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[3]  Robin Milner,et al.  Communication and concurrency , 1989, PHI Series in computer science.

[4]  K. Selçuk Candan,et al.  Distributed XML processing: Theory and applications , 2008, J. Parallel Distributed Comput..

[5]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[6]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[7]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[8]  David W. McDonald,et al.  Social matching: A framework and research agenda , 2005, TCHI.

[9]  FoggiaPasquale,et al.  A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs , 2004 .

[10]  Lisa Kaati,et al.  Detecting Social Positions Using Simulation , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[13]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[14]  Rachel Croson,et al.  The boundaries of trust: own and others' actions in the US and China , 2004 .

[15]  Chee Yong Chan,et al.  Minimization of tree pattern queries with constraints , 2008, SIGMOD Conference.

[16]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[17]  Orna Grumberg,et al.  Simulation Based Minimization , 2000, CADE.

[18]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Agostino Dovier,et al.  The Subgraph Bisimulation Problem , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Viggo Kann,et al.  On the Approximability of the Maximum Common Subgraph Problem , 1992, STACS.

[21]  Sihem Amer-Yahia,et al.  Challenges in Searching Online Communities , 2007, IEEE Data Eng. Bull..

[22]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[23]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[24]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[26]  Brian Gallagher,et al.  Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching , 2006, AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.

[27]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[28]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[29]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[30]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[31]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[32]  Athena Vakali,et al.  Massive Graph Management for the Web and Web 2.0 , 2011, New Directions in Web Data Management 1.

[33]  Wenfei Fan,et al.  Distributed query evaluation with performance guarantees , 2007, SIGMOD '07.

[34]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[35]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[36]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.