MiniGraph: Querying Big Graphs with a Single Machine

This paper presents MiniGraph, an out-of-core system for querying big graphs with a single machine. As opposed to previous single-machine graph systems, MiniGraph proposes a pipelined architecture to overlap I/O and CPU operations, and improves multi-core parallelism. It also introduces a hybrid model to support both vertex-centric and graph-centric parallel computations, to simplify parallel graph programming, speed up beyond-neighborhood computations, and parallelize computations within each subgraph. The model induces a two-level parallel execution model to explore both inter-subgraph and intra-subgraph parallelism. Moreover, MiniGraph develops new optimization techniques under its architecture. Using real-life graphs of different types, we show that MiniGraph is up to 76.1x faster than prior out-of-core systems, and performs better than some multi-machine systems that use up to 12 machines.

[1]  W. Fan Big graphs , 2022, Proceedings of the VLDB Endowment.

[2]  Dan Feng,et al.  A Hybrid Update Strategy for I/O-Efficient Out-of-Core Graph Processing , 2020, IEEE Transactions on Parallel and Distributed Systems.

[3]  Ping Lu,et al.  Application Driven Graph Partitioning , 2020, SIGMOD Conference.

[4]  Keshav Pingali,et al.  Single machine graph analytics on massive datasets using Intel optane DC persistent memory , 2019, Proc. VLDB Endow..

[5]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[6]  Yinghui Wu,et al.  Parallelizing Sequential Graph Computations , 2018, ACM Trans. Database Syst..

[7]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[8]  Shoaib Kamil,et al.  GraphIt: a high-performance graph DSL , 2018, Proc. ACM Program. Lang..

[9]  Stergios Stergiou,et al.  Shortcutting Label Propagation for Distributed Connected Components , 2018, WSDM.

[10]  Yifan Li,et al.  Edge partitioning of large graphs , 2017 .

[11]  Lixin Gao,et al.  Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation , 2017, 1710.05785.

[12]  Weimin Zheng,et al.  Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O , 2017, USENIX Annual Technical Conference.

[13]  Yafei Dai,et al.  Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication , 2017, USENIX Annual Technical Conference.

[14]  Mohan Kumar,et al.  Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.

[15]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[16]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.

[17]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[18]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[19]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[20]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[21]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[22]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[23]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[24]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[25]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[26]  Haibo Chen,et al.  SYNC or ASYNC: time to fuse for distributed graph-parallel computation , 2015, PPoPP.

[27]  Wilfred Ng,et al.  Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees , 2014, Proc. VLDB Endow..

[28]  Wilfred Ng,et al.  Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[29]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[30]  Amir H. Payberah,et al.  Distributed Vertex-Cut Partitioning , 2014, DAIS.

[31]  Lakshmish Ramaswamy,et al.  A distributed vertex-centric approach for pattern matching in massive graphs , 2013, 2013 IEEE International Conference on Big Data.

[32]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[33]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[34]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[35]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[36]  Pili Hu,et al.  A Survey and Taxonomy of Graph Sampling , 2013, ArXiv.

[37]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[38]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[39]  Carlos Guestrin,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 31 Graphchi: Large-scale Graph Computation on Just a Pc , 2022 .

[40]  K. Selçuk Candan,et al.  SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices , 2012, Data Knowl. Eng..

[41]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[42]  Harald Räcke,et al.  Balanced graph partitioning , 2004, SPAA '04.

[43]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[44]  Walter Ludwig,et al.  Algorithms for scheduling malleable and nonmalleable parallel tasks , 1996, Technical Report / University of Wisconsin, Madison / Computer Sciences Department.

[45]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[46]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.

[47]  W. Fan Big Graphs: Challenges and Opportunities , 2022, Proc. VLDB Endow..

[48]  Zhengping Qian,et al.  GraphScope: A Unified Engine For Big Graph Processing , 2021, Proc. VLDB Endow..

[49]  Scott Beamer,et al.  Understanding and Improving Graph Algorithm Performance , 2016 .

[50]  Johannes Gehrke,et al.  Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[51]  Carlos Guestrin,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012 .

[52]  M. Young,et al.  The Stone-Weierstrass Theorem , 2022 .

[53]  Robin Milner,et al.  Communication and concurrency , 1989, PHI Series in computer science.

[54]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .