Efficient failure-aware programming

Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. The advent of Map Reduce, Resilient Data Sets and MillWheel has shown dramatic improvements in productivity are possible when a high-level programming framework handles scale-out and resilience automatically. We are concerned with the development of generalpurpose languages that support resilient programming. In this paper we show how the X10 language and implementation can be extended to support resilience. In Resilient X10, places may fail asynchronously, causing loss of the data and tasks at the failed place. Failure is exposed through exceptions. We identify a Happens Before Invariance Principle and require the runtime to automatically repair the global control structure of the program to maintain this principle. We show this reduces much of the burden of resilient programming. The programmer is only responsible for continuing execution with fewer computational resources and the loss of part of the heap, and can do so while taking advantage of domain knowledge. We build a complete implementation of the language, capable of executing benchmark applications on hundreds of nodes. We describe the algorithms required to make the language runtime resilient. We then give three applications, each with a different approach to fault tolerance (replay, decWork done while employed at IBM T. J. Watson Research Center.

[1]  Toshio Suganuma,et al.  Compiling X10 to Java , 2011, X10 '11.

[2]  David Cunningham,et al.  Java interoperability in managed X10 , 2013, X10 '13.

[3]  David Grove,et al.  X10 as a Parallel Language for Scientific Computation: Practice and Experience , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[4]  David Cunningham,et al.  M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..

[5]  Toyotaro Suzumura,et al.  Scalable performance of ScaleGraph for large scale graph analysis , 2012, 2012 19th International Conference on High Performance Computing.

[6]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[7]  Alistair P. Rendell,et al.  PGAS‐FMM: Implementing a distributed fast multipole method using the X10 programming language , 2014, Concurr. Comput. Pract. Exp..

[8]  Sriram Krishnamoorthy,et al.  Lifeline-based global load balancing , 2011, PPoPP '11.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Jinsuk Chung,et al.  Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems , 2012, HiPC 2012.

[11]  Radha Jagadeesan,et al.  Concurrent Clustered Programming , 2005, CONCUR.

[12]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[13]  Haibo Chen,et al.  X10-FT: transparent fault tolerance for APGAS language and runtime , 2013, PMAM '13.

[14]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[15]  Haibo Chen,et al.  X10-FT: Transparent fault tolerance for APGAS language and runtime , 2014, Parallel Comput..

[16]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[19]  Silvia Crafa,et al.  Semantics of (Resilient) X10 , 2013, ECOOP.

[20]  Laxmikant V. Kalé,et al.  Adoption protocols for fanout-optimal fault-tolerant termination detection , 2013, PPoPP '13.