A General-Purpose Query-Centric Framework for Querying Big Graphs

Pioneered by Google's Pregel, many distributed systems have been developed for large-scale graph analytics. These systems employ a user-friendly "think like a vertex" programming model, and exhibit good scalability for tasks where the majority of graph vertices participate in computation. However, the design of these systems can seriously under-utilize the resources in a cluster for processing light-workload graph queries, where only a small fraction of vertices need to be accessed. In this work, we develop a new open-source system, called Quegel, for querying big graphs. Quegel treats queries as first-class citizens in its design: users only need to specify the Pregel-like algorithm for a generic query, and Quegel processes light-workload graph queries on demand, using a novel superstep-sharing execution model to effectively utilize the cluster resources. Quegel further provides a convenient interface for constructing graph indexes, which significantly improve query performance but are not supported by existing graph-parallel systems. Our experiments verified that Quegel is highly efficient in answering various types of graph queries and is up to orders of magnitude faster than existing systems.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Wilfred Ng,et al.  Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation , 2015, WWW.

[3]  Cyrus Shahabi,et al.  Indexing land surface for efficient kNN query , 2008, Proc. VLDB Endow..

[4]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[5]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[6]  Sameh Elnikety,et al.  Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs , 2013, Proc. VLDB Endow..

[7]  Chang Zhou,et al.  Continuous pattern detection over billion-edge graph using distributed framework , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[8]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[9]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[10]  Raymond Chi-Wing Wong,et al.  Finding shortest path on land surface , 2011, SIGMOD '11.

[11]  Aoying Zhou,et al.  Distributed SLCA-Based XML Keyword Search by Map-Reduce , 2010, DASFAA Workshops.

[12]  Haixun Wang,et al.  Hub-Accelerator: Fast and Exact Shortest Path Computation in Large Social Networks , 2013, ArXiv.

[13]  Marianne Winslett,et al.  Using structural information in XML keyword search effectively , 2011, TODS.

[14]  Jeffrey Xu Yu,et al.  Reachability Querying: An Independent Permutation Labeling Approach , 2014, Proc. VLDB Endow..

[15]  Yi Lu,et al.  Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation , 2014, Proc. VLDB Endow..

[16]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[17]  Chengqi Zhang,et al.  Scalable big graph processing in MapReduce , 2014, SIGMOD Conference.

[18]  J. F. Hangouet COMPUTATION OF THE HAUSDORFF DISTANCE BETWEEN PLANE VECTOR POLYLINES , 2008 .

[19]  Raymond Chi-Wing Wong,et al.  Finding Shortest Paths on Terrains by Killing Two Birds with One Stone , 2013, Proc. VLDB Endow..

[20]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[21]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[22]  Wilfred Ng,et al.  Monochromatic and bichromatic reverse nearest neighbor queries on land surfaces , 2012, CIKM.

[23]  Borislav Iordanov,et al.  HyperGraphDB: A Generalized Graph Database , 2010, WAIM Workshops.

[24]  Fan Yang,et al.  Husky: Towards a More Efficient and Expressive Distributed Computing Framework , 2016, Proc. VLDB Endow..

[25]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[27]  Xudong Lin,et al.  Fast SLCA and ELCA Computation for XML Keyword Queries Based on Set Intersection , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[28]  Qing Zhu,et al.  I/O cost minimization: reachability queries processing over massive graphs , 2012, EDBT '12.

[29]  Tok Wang Ling,et al.  Fast Result Enumeration for Keyword Queries on XML Data , 2012, J. Comput. Sci. Eng..

[30]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[31]  Joseph O'Rourke,et al.  An Implementation of Chen & Han's Shortest Paths Algorithm , 2000, Canadian Conference on Computational Geometry.

[32]  Sameh Elnikety,et al.  Systems for Big-Graphs , 2014, Proc. VLDB Endow..

[33]  Xiaofeng Meng,et al.  Efficient query processing for XML keyword queries based on the IDList index , 2013, The VLDB Journal.

[34]  Sherif Sakr,et al.  G-SPARQL: a hybrid engine for querying large attributed graphs , 2012, CIKM.

[35]  Kian-Lee Tan,et al.  CANDS: Continuous Optimal Navigation via Distributed Stream Processing , 2014, Proc. VLDB Endow..

[36]  Jeffrey Xu Yu,et al.  Divide & Conquer: I/O Efficient Depth-First Search , 2015, SIGMOD Conference.

[37]  Wilfred Ng,et al.  Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[38]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[39]  Wilfred Ng,et al.  Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees , 2014, Proc. VLDB Endow..

[40]  LuYi,et al.  Large-scale distributed graph computing systems , 2014, VLDB 2014.