SCaLeM: A Framework for Characterizing and Analyzing Execution Models

As scalable parallel systems evolve towards more complex nodes with many-core architectures and larger trans-petascale & upcoming exascale deployments, there is a need to understand, characterize and quantify the underlying execution models being used on such systems. Execution models are a conceptual layer between applications & algorithms and the underlying parallel hardware and systems software on which those applications run. This paper presents the SCaLeM (Synchronization, Concurrency, Locality, Memory) framework for characterizing and execution models. SCaLeM consists of three basic elements: attributes, compositions and mapping of these compositions to abstract parallel systems. The fundamental Synchronization, Concurrency, Locality and Memory attributes are used to characterize each execution model, while the combinations of those attributes in the form of compositions are used to describe the primitive operations of the execution model. The mapping of the execution model's primitive operations described by compositions, to an underlying abstract parallel system can be evaluated quantitatively to determine its effectiveness. Finally, SCaLeM also enables the representation and analysis of applications in terms of execution models, for the purpose of evaluating the effectiveness of such mapping.

[1]  Pieter H. Hartel,et al.  Abstract machines for programming language implementation , 2000, Future Gener. Comput. Syst..

[2]  Phillip Stanley-Marbell,et al.  A unified execution model for cloud computing , 2010, OPSR.

[3]  Thomas L. Sterling,et al.  Preliminary design examination of the ParalleX system from a software and hardware perspective , 2011, PERV.

[4]  Vivek Sarkar,et al.  X10: concurrent programming for modern architectures , 2007, PPOPP.

[5]  Daniel A. Orozco TIDeFlow: A Parallel Execution Model for High Performance Computing Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[6]  Xingbin Zhang,et al.  A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  Sriram Krishnamoorthy,et al.  Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[8]  Sayantan Sur,et al.  Unifying UPC and MPI runtimes: experience with MVAPICH , 2010, PGAS '10.

[9]  Vivek Sarkar,et al.  Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.

[10]  Gopal Gupta,et al.  Analysis of Or-parallel execution models , 1993, TOPL.

[11]  William Gropp,et al.  MPI 3 and Beyond: Why MPI Is Successful and What Challenges It Faces , 2012, EuroMPI.

[12]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[13]  Koichiro Tamura,et al.  A parallel execution model of logic programs , 1983, ISCA '83.

[14]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[15]  Mounira Belmesk An execution model for exploiting and-or parallelism in logic programs (abstract) , 1990, ISSAC '90.

[16]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[17]  Edward H. Bensley,et al.  An Execution Model for Distributed Object-Oriented Computation , 1988, OOPSLA.

[18]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[19]  Laxmikant V. Kale Characteristics of adaptive runtime systems in HPC , 2013, ROSS '13.

[20]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[21]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[22]  Robert J. Harrison,et al.  Global Arrays: Scientific Programming for Scalable Parallel Computers , 2011 .

[23]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[24]  Sayantan Sur,et al.  RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[25]  H. Peter Hofstee,et al.  Hardware and software architectures for the CELL processor , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[26]  Vivek Sarkar,et al.  Location Consistency-A New Memory Model and Cache Consistency Protocol , 2000, IEEE Trans. Computers.

[27]  Yow-Jian Lin,et al.  An execution model for exploiting AND-parallelism in logic programs , 2009, New Generation Computing.

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[29]  Jack B. Dennis,et al.  A parallel program execution model supporting modular software construction , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[30]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[31]  Thomas L. Sterling HPC in Phase Change: Towards a New Execution Model , 2010, VECPAR.

[32]  Michael Lang,et al.  GoDEL: A Multidirectional Dataflow Execution Model for Large-Scale Computing , 2011, 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing.

[33]  Polychronis Xekalakis,et al.  Mixed speculative multithreaded execution models , 2012, TACO.

[34]  Thomas L. Sterling,et al.  Improving the scalability of parallel N-body applications with an event-driven constraint-based execution model , 2012, Int. J. High Perform. Comput. Appl..