Path-based holistic detection plan for multiple patterns in distributed graph frameworks

Multiple pattern detection is needed in applications like disease analysis over gene networks, bug detection in program flow networks. This paper takes pattern detection to investigate the evaluation and optimization of multiple jobs in existing distributed graph processing frameworks. The evaluation plan for multiple pattern detection should be parallelizable and can capture and reuse the shared parts among pattern queries easily. In this paper, we design a path-based holistic plan for multiple pattern queries. Specifically, (1) we design a path-based edge-covered plan for an individual pattern. The paths in the plan can be easily captured and reused among different queries. Additionally, the evaluation plan is fully parallelizable, in which each data vertex performs necessary join operations independently during exploring graph. (2) We extend the individual plan to a holistic evaluation plan for multiple queries, whose results are equivalent to those of individual queries. The plan reduces the overall cost by finding frequent paths among queries and reusing the shared part in the holistic plan. (3) We devise various optimization strategies over the holistic plan. The experimental studies, conducted on Giraph, illustrate the high effectiveness of our holistic approaches.

[1]  Jennifer Widom,et al.  Optimizing Graph Algorithms on Pregel-like Systems , 2014, Proc. VLDB Endow..

[2]  Ashraf Aboulnaga,et al.  ReStore: Reusing Results of MapReduce Jobs , 2012, Proc. VLDB Endow..

[3]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[4]  Chang Zhou,et al.  Toward continuous pattern detection over evolving large graph with snapshot isolation , 2015, The VLDB Journal.

[5]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[6]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[7]  Yanlei Diao,et al.  Towards an Internet-Scale XML Dissemination Service , 2004, VLDB.

[8]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[9]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[10]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[11]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[12]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Timos K. Sellis,et al.  On the Multiple-Query Optimization Problem , 1990, IEEE Trans. Knowl. Data Eng..

[14]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[15]  Peng Gang Sun,et al.  The human Drug-Disease-Gene Network , 2015, Inf. Sci..

[16]  Yanlei Diao,et al.  High-Performance XML Filtering: An Overview of YFilter , 2003, IEEE Data Eng. Bull..

[17]  Feifei Li,et al.  Scalable Multi-query Optimization for SPARQL , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[18]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[19]  Chang Zhou,et al.  Continuous pattern detection over billion-edge graph using distributed framework , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[20]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[21]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[22]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[23]  Guoping Wang,et al.  Multi-Query Optimization in MapReduce Framework , 2013, Proc. VLDB Endow..

[24]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.