论文信息 - L-Graph: A General Graph Analytic System on Continuous Computation

L-Graph: A General Graph Analytic System on Continuous Computation

Massive graph analytics have become an important aspect of multiple diverse applications. With the growing scale of real world graphs, efficient execution of entire graph analytics has become a challenging problem. Recently a number of distributed graph processing systems (Pregel, PowerGraph, Trinity) and centralized systems (GraphChi and XStream) have been designed. Compared with high expense of distributed systems deployed on a cluster of commodity machines, the centralized systems on cheap PCs are very attractive propositions with low expense and comparable performance. By careful analysis, we find that (i) the graph computation abstraction in the centralized systems inherently adopted a batch model similar to the distributed systems. The batch model could lead to suboptimal performance. (ii) The execution model in the centralized systems advocates sequential operations on Solid State Disk (SSD) which are still slower than memory-based operations. In order to tackle the above efficiency issues in centralized systems, we first propose a novel continuous graph computation abstraction. This model continuously processes edges and updates computation results. It allows much faster convergence than the batch model. Second, we propose to maintain vertex states in memory and advocates memory-based operations for much faster I/O operations than sequential operations on SSD. Finally, we design an adaptive memory layout to minimize overall I/O cost. We develop a proof of concept prototype L-Graph and implement four example graph analytic applications atop L-Graph. Preliminary evaluation on real and synthetic graphs have verified that the proposed continuous model greatly performs the widely used batch model and L-Graph can achiever much higher efficiency than the state of arts GraphChi.

[1] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[2] Jonathan W. Berry,et al. Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[3] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[4] Johannes Gehrke,et al. Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[5] Haixun Wang,et al. Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[6] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[7] Zoubin Ghahramani,et al. Learning from labeled and unlabeled data with label propagation , 2002 .

[8] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[9] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[10] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .