Multiverse: efficiently supporting distributed high-level speculation

Algorithmic speculation or high-level speculation is a promising programming paradigm which allows programmers to speculatively branch an execution into multiple independent parallel sections and then choose the best (perhaps fastest) amongst them. The continuing execution after the speculatively branched section sees only the modifications made by the best one. This programming paradigm allows programmers to harness parallelism and can provide dramatic performance improvements. In this paper we present the Multiverse speculative programming model. Multiverse allows programmers to exploit parallelism through high-level speculation. It can effectively harness large amounts of parallelism by speculating across an entire cluster and is not bound by the parallelism available in a single machine. We present abstractions and a runtime which allow programmers to introduce large scale high-level speculative parallelism into applications with minimal effort. We introduce a novel on-demand address space sharing mechanism which provide speculations efficient transparent access to the original address space of the application (including the use of pointers) across machine boundaries. Multiverse provides single commit semantics across speculations while guaranteeing isolation between them. We also introduce novel mechanisms to deal with scalability bottlenecks when there are a large number of speculations. We demonstrate that for several benchmarks, Multiverse achieves impressive speedups and good scalability across entire clusters. We study the overheads of the runtime and demonstrate how our special scalability mechanisms are crucial in scaling cluster wide.

[1]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.

[2]  Hari K. Pyla,et al.  Exploiting coarse-grain speculative parallelism , 2011, OOPSLA '11.

[3]  Laxmikant V. Kalé,et al.  Automatic MPI to AMPI Program Transformation Using Photran , 2010, Euro-Par Workshops.

[4]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[5]  Wolfgang Ertel,et al.  Optimal parallelization of Las Vegas algorithms , 1993, Forschungsberichte, TU Munich.

[6]  David Evans,et al.  N-Variant Systems: A Secretless Framework for Security through Diversity , 2006, USENIX Security Symposium.

[7]  William R. Dieter,et al.  User-Level Checkpointing for LinuxThreads Programs , 2001, USENIX Annual Technical Conference, FREENIX Track.

[8]  Yale N. Patt,et al.  A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[9]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[10]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[11]  Kenneth Falconer,et al.  Unsolved Problems In Geometry , 1991 .

[12]  Gabriel Antoniu,et al.  An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System , 1999, IPPS/SPDP Workshops.

[13]  Helen D. Karatza Book Review: An Owner's Manual for High Performance Computing (A review of Techniques for Optimizing Applications: High Performance Computing by Rajat P. Garg and Ilya Sharapov) , 2002, IEEE Distributed Syst. Online.

[14]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA '09.

[15]  S. Golomb,et al.  Constructions and properties of Costas arrays , 1984, Proceedings of the IEEE.

[16]  Cristian Cadar,et al.  Safe software updates via multi-version execution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[17]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[18]  Mauricio G. C. Resende,et al.  A continuous approach to inductive inference , 1992, Math. Program..

[19]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[20]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[21]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[22]  Youssef Hamadi,et al.  A Concurrent Portfolio Approach to SMT Solving , 2009, CAV.

[23]  Wei Li,et al.  Exact Phase Transitions in Random Constraint Satisfaction Problems , 2000, J. Artif. Intell. Res..

[24]  Thomas R. Gross,et al.  Variant-based competitive parallel execution of sequential programs , 2010, CF '10.

[25]  John K. Bennett,et al.  Efficient user-level thread migration and checkpointing on windows NT clusters , 1999 .

[26]  G. Ramalingam,et al.  Safe programmable speculative parallelism , 2010, PLDI '10.

[27]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[28]  Tipp Moseley,et al.  PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures , 2009, IEEE Transactions on Dependable and Secure Computing.

[29]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[30]  Santosh Pande,et al.  Efficiently speeding up sequential computation through the n-way programming model , 2011, OOPSLA '11.

[31]  Keld Helsgaun,et al.  An effective implementation of the Lin-Kernighan traveling salesman heuristic , 2000, Eur. J. Oper. Res..

[32]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[33]  Mary Shaw,et al.  Global variable considered harmful , 1973, SIGP.

[34]  Rajat P. Garg,et al.  Techniques for Optimizing Applications: High Performance Computing , 2001 .

[35]  Gene Cooperman,et al.  Transparent User-Level Checkpointing for the Native Posix Thread Library for Linux , 2006, PDPTA.

[36]  Holger H. Hoos,et al.  Stochastic local search - methods, models, applications , 1998, DISKI.

[37]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[38]  James R. Larus,et al.  Transactional Memory, 2nd edition , 2010, Transactional Memory.

[39]  Jaejin Lee,et al.  Adaptive execution techniques for SMT multiprocessor architectures , 2005, PPOPP.

[40]  David J. Lilja,et al.  Data prefetch mechanisms , 2000, CSUR.

[41]  Philippe Codognet,et al.  Performance analysis of parallel constraint-based local search , 2012, PPoPP '12.