PRS: Parallel Relaxation Simulation for Massive Graphs

Graph pattern matching is becoming important for a variety of emerging applications such as social network analysis. Graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. These notions, however, often impose too strong constraint to identify meaningful matches. In this paper, we propose a new graph pattern matching based on a notion of relaxation simulation, which extends graph simulation by allowing partially absent vertices. We show that relaxation simulation is able to find significant matches which traditional approaches of graph pattern matching fail to catch. We propose two parallel algorithms to apply relaxation simulation to massive graph since the graph in practice is considerably large. The algorithms are based on Bulk Synchronous Parallel model and can be easily deployed on cloud computing platforms. Finally, we experimentally verify the effectiveness and efficiency of these algorithms, using real-life data and synthesis data. The results suggest that relaxation simulation is a promising framework for real-life massive graph analysis.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Lakshmish Ramaswamy,et al.  A distributed vertex-centric approach for pattern matching in massive graphs , 2013, 2013 IEEE International Conference on Big Data.

[3]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[4]  Xin Wang,et al.  Incremental graph pattern matching , 2013, TODS.

[5]  Abraham Kandel,et al.  Classification Of Web Documents Using Graph Matching , 2004, Int. J. Pattern Recognit. Artif. Intell..

[6]  Xin Wang,et al.  Diversified Top-k Graph Pattern Matching , 2013, Proc. VLDB Endow..

[7]  Brian Gallagher,et al.  Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching , 2006, AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.

[8]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[13]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[14]  Tianyu Wo,et al.  Capturing Topology in Graph Pattern Matching , 2011, Proc. VLDB Endow..

[15]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Xin Wang,et al.  Performance Guarantees for Distributed Reachability Queries , 2012, Proc. VLDB Endow..

[17]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Lisa Kaati,et al.  Detecting Social Positions Using Simulation , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Tianyu Wo,et al.  Distributed graph pattern matching , 2012, WWW.

[21]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[22]  Matthew Felice Pace,et al.  BSP vs MapReduce , 2012, ICCS.

[23]  K. Selçuk Candan,et al.  Distributed XML processing: Theory and applications , 2008, J. Parallel Distributed Comput..

[24]  Jianzhong Li,et al.  Graph homomorphism revisited for graph matching , 2010, Proc. VLDB Endow..

[25]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[26]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[27]  Jeffrey Xu Yu,et al.  Top-k graph pattern matching over large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[28]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[29]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..