Supporting On-demand Elasticity in Distributed Graph Processing

While distributed graph processing engines have become popular for processing large graphs, these engines are typically configured with a static set of servers in the cluster. In other words, they lack the flexibility to scale-out or scale-in the number of servers, when requested to do so by the user. In this paper, we propose the first techniques to make distributed graph processing truly elastic. While supporting on-demand scale-out/in operations, we meet three goals: i) perform scale-out/in without interrupting the graph computation, ii) minimize the background network overhead involved in the scale-out/in, and iii) mitigate stragglers by maintaining load balance across servers. We present and analyze two techniques called Contiguous Vertex Repartitioning (CVR) and Ring-based Vertex Repartitioning (RVR) to address these goals. We implement our techniques in the LFGraph distributed graph processing system, and incorporate several systems optimizations. Experiments performed with multiple graph benchmark applications on a real graph indicate that our techniques perform within 9% and 21% of the optimum for scale-out and scale-in operations, respectively.

[1]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[2]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[3]  Indranil Gupta,et al.  Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems , 2015, IEEE Transactions on Emerging Topics in Computing.

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Michael J. Carey,et al.  Pregelix: Big(ger) Graph Analytics on a Dataflow Engine , 2014, Proc. VLDB Endow..

[6]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[7]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[8]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[9]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[10]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[11]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[12]  Xiaohui Gu,et al.  AGILE: Elastic Distributed Resource Scaling for Infrastructure-as-a-Service , 2013, ICAC.

[13]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[14]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[15]  R. Jonker,et al.  Improving the Hungarian assignment algorithm , 1986 .

[16]  Jie Lu,et al.  Optimal Cloud Resource Auto-Scaling for Web Applications , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[17]  Indranil Gupta,et al.  LFGraph: simple and fast distributed graph analytics , 2013, TRIOS@SOSP.

[18]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[19]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[20]  Ioannis Konstantinou,et al.  Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[21]  Quanyan Zhu,et al.  Dynamic energy-aware capacity provisioning for cloud computing environments , 2012, ICAC '12.

[22]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[23]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[24]  Félix Cuadrado,et al.  xDGP: A Dynamic Graph Processing System with Adaptive Partitioning , 2013, ArXiv.

[25]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[26]  Paolo Romano,et al.  Transactional auto scaler: elastic scaling of in-memory transactional data grids , 2012, ICAC '12.

[27]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[28]  Mor Harchol-Balter,et al.  AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers , 2012, TOCS.

[29]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2012, TNET.