Exploiting Asynchrony for Performance and Fault Tolerance in Distributed Graph Processing

Author(s): Vora, Keval | Advisor(s): Gupta, Rajiv | Abstract: While various iterative graph algorithms can be expressed via asynchronous parallelism, lack of its proper understanding limits the performance benefits that can be achieved via informed relaxations. In this thesis, we capture the algorithmic intricacies and execution semantics that enable us to improve asynchronous processing and allow us to reason about semantics of asynchronous execution while leveraging its benefits. To this end, we specify the asynchronous processing model in a distributed setting by identifying key properties of read-write dependences and ordering of reads that expose the set of legal executions of an asynchronous program. And then, we develop techniques to exploit the availability of multiple legal executions by choosing faster executions that reduce communication and computation while processing static and dynamic graphs. For static graphs, we first develop a relaxed consistency protocol to allow the use of stale values during processing in order to eliminate long latency communication operations by up to 58%, hence accelerating the overall processing by a factor of 2. Then, to efficiently handle machine failures, we present a light-weight confined recovery strategy that quickly constructs an alternate execution state that may be different from any previously encountered program state, but is nevertheless a legal state that guarantees correct asynchronous semantics upon resumption of execution. Our confined recovery strategy enables the processing to finish 1.5-3.2x faster compared to the traditional recovery mechanism when failures impact 1-6 machines of a 16 machine cluster.We further design techniques based on computation reordering and incremental computation to amortize the computation and communication costs incurred in processing evolving graphs, hence accelerating their processing by up to 10x. Finally, to process streaming graphs, we develop a dynamic dependence based incremental processing technique that identifies the minimal set of computations required to calculate the change in results that reflects the mutation in graph structure. We show that this technique not only produces correct results, but also improves processing by 8.5-23.7x.Finally, we demonstrate the efficacy of asynchrony beyond distributed setting by leveraging it to design dynamic partitions that eliminate wasteful disk I/O involved in out-of-core graph processing by 25-76%.

[1]  James Bennett,et al.  The Netflix Prize , 2007 .

[2]  Udayan Khurana,et al.  Efficient snapshot retrieval over historical graph data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[3]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[4]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[5]  Tandy J. Warnow,et al.  Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology , 1996, SODA '96.

[6]  Yingyi Bu,et al.  Pregelix: dataflow-based big graph analytics , 2013, SoCC.

[7]  Ryan A. Rossi,et al.  Modeling dynamic behavior in large evolving graphs , 2013, WSDM.

[8]  Tore Risch,et al.  Massive scale-out of expensive continuous queries , 2011, Proc. VLDB Endow..

[9]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[10]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[11]  Willy Zwaenepoel,et al.  Techniques for reducing consistency-related communication in distributed shared-memory systems , 1995, TOCS.

[12]  Himanshu Sinha,et al.  An overview of Mermera: a system and formalism for non-coherent distributed parallel memory , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[13]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[14]  Reynold Cheng,et al.  On querying historical evolving graph sequences , 2011, Proc. VLDB Endow..

[15]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[16]  Brian N. Bershad,et al.  Midway : shared memory parallel programming with entry consistency for distributed memory multiprocessors , 1991 .

[17]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[18]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[19]  Barbara G. Ryder,et al.  A Critical Analysis of Incremental Iterative Data Flow Analysis Algorithms , 1990, IEEE Trans. Software Eng..

[20]  Christos Faloutsos,et al.  Inference of Beliefs on Billion-Scale Graphs , 2010 .

[21]  Abdelsalam Heddaya,et al.  Coherence, Non-coherence and Local Consistency in Distributed Shared Memory for Parallel Computing , 1992 .

[22]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.

[23]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[24]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[25]  Seunghak Lee,et al.  Solving the Straggler Problem with Bounded Staleness , 2013, HotOS.

[26]  Young-Koo Lee,et al.  BiShard Parallel Processor: A Disk-Based Processing Engine for Billion-Scale Graphs , 2014, MUE 2014.

[27]  Alan L. Cox,et al.  Software DSM protocols that adapt between single writer and multiple writer , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[28]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[29]  James R. Goodman,et al.  Cache Consistency and Sequential Consistency , 1991 .

[30]  Xiangyu Zhang,et al.  Pruning dynamic slices with confidence , 2006, PLDI '06.

[31]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[32]  Kourosh Gharachorloo,et al.  Design and performance of the Shasta distributed shared memory protocol , 1997, ICS '97.

[33]  Himanshu Sinha,et al.  An Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence , 1993 .

[34]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[35]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[36]  Liviu Iftode,et al.  Relaxed consistency and coherence granularity in DSM systems: a performance evaluation , 1997, PPOPP '97.

[37]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[38]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[39]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[40]  Vicenç Gómez,et al.  Statistical analysis of the social network and discussion threads in slashdot , 2008, WWW.

[41]  Jinyang Li,et al.  Building fast, distributed programs with partitioned tables , 2010 .

[42]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[43]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[44]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[45]  Zhendong Su,et al.  GraphQ: Graph Query Processing with Abstraction Refinement , 2015 .

[46]  Vijay Karamcheti,et al.  Object views: language support for intelligent object caching in parallel and distributed computations , 1999, OOPSLA '99.

[47]  Rajiv Gupta,et al.  CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing , 2017, ASPLOS.

[48]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[49]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[50]  George Karypis,et al.  Parmetis parallel graph partitioning and sparse matrix ordering library , 1997 .

[51]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[52]  Jessica K. Hodgins,et al.  Temporal notions of synchronization and consistency in Beehive , 1997, SPAA '97.

[53]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[54]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[55]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[56]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[57]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[58]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[59]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[60]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[61]  Xiangyu Zhang,et al.  Precise dynamic slicing algorithms , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[62]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[63]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[64]  Toyotaro Suzumura,et al.  Towards large-scale graph stream processing platform , 2014, WWW.

[65]  Feipei Lai,et al.  Adsmith: an efficient object-based distributed shared memory system on PVM , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[66]  Haifeng Jiang,et al.  Photon: fault-tolerant and scalable joining of continuous data streams , 2013, SIGMOD '13.

[67]  Galen C. Hunt,et al.  Shared memory computing on clusters with symmetric multiprocessors and system area networks , 2005, TOCS.

[68]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[69]  David A. Bader,et al.  Massive streaming data analytics: A case study with clustering coefficients , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[70]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[71]  Luke M. Leslie,et al.  Zorro: zero-cost reactive failure recovery in distributed graph processing , 2015, SoCC.

[72]  Michael L. Scott,et al.  Exploiting high-level coherence information to optimize distributed shared state , 2003, PPoPP '03.

[73]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[74]  Jaideep Srivastava,et al.  Incremental page rank computation on evolving graphs , 2005, WWW '05.

[75]  Sai Charan Koduru,et al.  Programming Large Dynamic Data Structures on a DSM Cluster of Multicores ∗ , 2013 .

[76]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[77]  Ayman Farahat,et al.  Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization , 2005, SIAM J. Sci. Comput..

[78]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[79]  Kun-Lung Wu,et al.  Efficient processing of streaming graphs for evolution-aware clustering , 2013, CIKM.

[80]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[81]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[82]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[83]  Uri Zwick,et al.  A fully dynamic reachability algorithm for directed graphs with an almost linear update time , 2004, STOC '04.

[84]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[85]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[86]  Larry Rudolph,et al.  CACHET: an adaptive cache coherence protocol for distributed shared-memory systems , 1999, ICS '99.

[87]  Rajiv Gupta,et al.  Synergistic Analysis of Evolving Graphs , 2016, ACM Trans. Archit. Code Optim..

[88]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[89]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[90]  Arie Shoshani,et al.  Enabling Real-Time Querying of Live and Historical Stream Data , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[91]  Binyu Zang,et al.  Computation and communication efficient graph processing with distributed immutable view , 2014, HPDC '14.

[92]  Zhuhua Cai,et al.  Facilitating real-time graph mining , 2012, CloudDB '12.

[93]  Gustavo Alonso,et al.  Augmented Sketch: Faster and More Accurate Stream Processing , 2016, SIGMOD Conference.

[94]  Antonio Lima,et al.  The Anatomy of a Scientific Gossip , 2013, ArXiv.

[95]  Lakshmish Ramaswamy,et al.  Towards efficient query processing on massive time-evolving graphs , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[96]  Anant Agarwal,et al.  Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[97]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[98]  Anders Kristensen,et al.  Problem-oriented object memory: customizing consistency , 1995, OOPSLA.

[99]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[100]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.

[101]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[102]  Xing Xie,et al.  Effective Social Graph Deanonymization Based on Graph Structure and Descriptive Information , 2015, ACM Trans. Intell. Syst. Technol..

[103]  D. Manivannan,et al.  Quasi-Synchronous Checkpointing: Models, Characterization, and Classification , 1999, IEEE Trans. Parallel Distributed Syst..

[104]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[105]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[106]  Assaf Schuster,et al.  Distributed Shared Memory: To Relax or Not to Relax? , 2004, Euro-Par.

[107]  Paolo Avesani,et al.  Controversial Users Demand Local Trust Metrics: An Experimental Study on Epinions.com Community , 2005, AAAI.

[108]  Johannes Gehrke,et al.  Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[109]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[110]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[111]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[112]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[113]  David Mosberger,et al.  Memory consistency models , 1993, OPSR.

[114]  Rajiv Gupta,et al.  Efficient Processing of Large Graphs via Input Reduction , 2016, HPDC.

[115]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[116]  Haibo Chen,et al.  Replication-Based Fault-Tolerance for Large-Scale Graph Processing , 2018, IEEE Transactions on Parallel and Distributed Systems.

[117]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[118]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[119]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[120]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[121]  James Bailey,et al.  A Query Based Approach for Mining Evolving Graphs , 2009, AusDM.

[122]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[123]  Martin K. Purvis,et al.  Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared Memory , 2004, ACSC.

[124]  Kai Wang,et al.  Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code , 2017, ASPLOS.

[125]  Xiangyu Zhang,et al.  Matching execution histories of program versions , 2005, ESEC/FSE-13.

[126]  David A. Bader,et al.  Tracking Structure of Streaming Social Networks , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[127]  Mustaque Ahamad,et al.  Slow memory: weakening consistency to enhance concurrency in distributed shared memories , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[128]  Heng Yin,et al.  Scalable Graph-based Bug Search for Firmware Images , 2016, CCS.

[129]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[130]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[131]  Shimon Even,et al.  An On-Line Edge-Deletion Problem , 1981, JACM.

[132]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[133]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[134]  Seunghak Lee,et al.  Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[135]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[136]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[137]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[138]  T. Murata,et al.  Advanced modularity-specialized label propagation algorithm for detecting communities in networks , 2009, 0910.1154.

[139]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[140]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[141]  Gang Chen,et al.  Fast Failure Recovery in Distributed Graph Processing Systems , 2014, Proc. VLDB Endow..

[142]  Liviu Iftode,et al.  Scope Consistency: A Bridge between Release Consistency and Entry Consistency , 1996, SPAA '96.

[143]  J. Tao,et al.  Improving the Scalability of Shared Memory Systems through Relaxed Consistency , 2002 .

[144]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.