Continuous pattern detection over billion-edge graph using distributed framework

Continuous pattern detection plays an important role in monitoring-related applications. The large size and dynamic update of graphs, along with the massive search space, pose huge challenges in developing an efficient continuous pattern detection system. In this paper, we leverage a distributed graph processing framework to approximately detect a given pattern over a large dynamic graph. We aim to improve the scalability and precision, and reduce the response time and message cost in the detection. We convert a given query pattern into a Single-Sink DAG (Directed Acyclic Graph), and propose an evaluation plan with message transitions on the DAG, which is shorten by SSD plan, to detect the pattern in a large dynamic graph. SSD plan can guide the data graph exploration via messages, and the messages will converge at data sink vertices, which then detect existences of the query pattern. We also conduct join operations over partial vertices during the graph exploration to improve the precision of pattern detection. In addition, we show that SSD plan can support the continuous query over dynamic graphs with slight extensions. We further design various sink vertex selection strategies and neighborhood based transition rule attachment to lower the evaluation cost. The experiments on billion-edge real-life graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[3]  Lei Chen,et al.  Continuous Subgraph Pattern Search over Graph Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[6]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[7]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[8]  Tianyu Wo,et al.  Distributed graph pattern matching , 2012, WWW.

[9]  Anthony K. H. Tung,et al.  An Efficient Graph Indexing Method , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[11]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[12]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[13]  Ambuj K. Singh,et al.  Query Language and Access Methods for Graph Databases , 2010, Managing and Mining Graph Data.

[14]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[15]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[16]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[17]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[18]  Amol Deshpande,et al.  Managing large dynamic graphs efficiently , 2012, SIGMOD Conference.

[19]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[20]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[21]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..