Multicore scheduling for lightweight communicating processes

Process-oriented programming is a design methodology in which software applications are constructed from communicating concurrent processes. A typical process-oriented design involves the composition of a large number of small isolated component processes. These concurrent components allow for the scalable parallel execution of the resulting application on both shared-memory and distributed-memory architectures. In this paper we present a runtime designed to support process-oriented programming by providing lightweight processes and communication primitives. The runtime's scheduler, implemented using lock-free algorithms, automatically executes concurrent components in parallel on multicore systems. Heuristics dynamically group processes into cache-affine work units based on communication patterns. Work units are then distributed via wait-free work-stealing. Initial performance analysis shows that, using the algorithms presented in this paper, process-oriented software can execute with an efficiency approaching that of optimised sequential and coarse-grain threaded designs.

[1]  Matthew C. Jadud,et al.  The Transterpreter: A Transputer Interpreter , 2004 .

[2]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[3]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[4]  Martin Odersky,et al.  Scala Actors: Unifying thread-based and event-based programming , 2009, Theor. Comput. Sci..

[5]  Mario Schweigler,et al.  A unified model for inter- and intra-processor concurrency , 2006 .

[6]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[7]  Andrew William Roscoe,et al.  The Theory and Practice of Concurrency , 1997 .

[8]  Torvald Riegel,et al.  Transactifying Applications Using an Open Compiler Framework , 2007 .

[9]  Alan L. Cox,et al.  Message passing versus distributed shared memory on networks of workstations , 1995 .

[10]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[11]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[12]  Joe Armstrong,et al.  Concurrent programming in ERLANG , 1993 .

[13]  Peter H. Welch,et al.  Higher-Level Paradigms for Deadlock-Free High-Performance Systems , 1993 .

[14]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[15]  Angela C. Sodan Message-passing and shared-data programming models - wish vs. reality , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).

[16]  Keir Fraser,et al.  Concurrent programming without locks , 2007, TOCS.

[17]  Peter H. Welch,et al.  A process-oriented architecture for complex system modelling , 2010 .

[18]  Anoop Gupta,et al.  Implementation of Production Systems on Message-Passing Computers , 1992, IEEE Trans. Parallel Distributed Syst..

[19]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[20]  Simon L. Peyton Jones,et al.  Haskell on a shared-memory multiprocessor , 2005, Haskell '05.

[21]  Peter H. Welch,et al.  Mobile Data Types for Communicating Processes , 2001 .

[22]  Carl G. Ritson Translating ETC to LLVM Assembly , 2009, CPA.

[23]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[24]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[25]  Peter H. Welch,et al.  Communicating Process Architectures 2012 , 2000 .

[26]  Mikael Pettersson,et al.  The HiPE/x86 Erlang Compiler: System Description and Performance Evaluation , 2002, FLOPS.

[27]  Colin Whitby-Strevens The transputer , 1985, ISCA 1985.

[28]  Benjamin C. Pierce,et al.  Concurrent Objects in a Process Calculus , 1994, Theory and Practice of Parallel Programming.

[29]  Leon S. Levy A walk through AWK , 1983, SIGP.

[30]  P. H. Welch,et al.  Networks, Routers and Transputers: Function, Performance and Applications , 1993 .

[31]  Susan Stepney,et al.  Investigating Patterns for the Process-Oriented Modelling and Simulation of Space in Complex Systems , 2008, ALIFE.

[32]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[33]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[34]  Kevin J. Vella,et al.  Seamless parallel computing on heterogeneous networks of multiprocessor workstations , 1998 .

[35]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[36]  Carl Hewitt,et al.  Viewing Control Structures as Patterns of Passing Messages , 1977, Artif. Intell..

[37]  John H. Reppy,et al.  Concurrent programming in ML , 1999 .

[38]  Peter H. Welch,et al.  Mobile Barriers for occam-pi: Semantics, Implementation and Application , 2005, CPA.

[39]  Nancy A. Lynch,et al.  Are wait-free algorithms fast? , 1994, JACM.

[40]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[41]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[42]  James R. Larus,et al.  Tempest: a substrate for portable parallel programs , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[43]  Claes Wikström,et al.  Concurrent programming in ERLANG (2nd ed.) , 1996 .

[44]  Robin Milner,et al.  Communicating and mobile systems - the Pi-calculus , 1999 .

[45]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[46]  John H. Reppy,et al.  Manticore: a heterogeneous parallel language , 2007, DAMP '07.

[47]  Peter H. Welch,et al.  Communicating Mobile Processes , 2004, 25 Years Communicating Sequential Processes.

[48]  Fred R. M. Barnes,et al.  Dynamics and pragmatics for high performance concurrency , 2003 .

[49]  Charles L. Seitz,et al.  Multicomputers: message-passing concurrent computers , 1988, Computer.

[50]  Atsuhiro Tanaka,et al.  Analysis and measurement of the effect of kernel locks in SMP systems , 2001, Concurr. Comput. Pract. Exp..

[51]  Benjamin C. Pierce,et al.  Pict: a programming language based on the Pi-Calculus , 2000, Proof, Language, and Interaction.

[52]  Kurt Debattista,et al.  Wait-free cache-affinity thread scheduling , 2003, IEE Proc. Softw..

[53]  Ian T. Foster,et al.  Compositional parallel programming languages , 1996, TOPL.