Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

Tuning applications for multicore systems involve subtle concurrency concepts and target-dependent optimizations. This paper advocates for a streaming execution model, called ER, where persistent processes communicate and synchronize through a multi-consumer processing applications, we demonstrate the scalability and efficiency advantages of streaming compared to data-driven scheduling. To exploit these benefits in compilers for parallel languages, we propose an intermediate representation enabling the compilation of data-flow tasks into streaming processes. This intermediate representation also facilitates the application of classical compiler optimizations to concurrent programs.

[1]  Robin Milner,et al.  A Calculus of Mobile Processes, II , 1992, Inf. Comput..

[2]  Lothar Thiele,et al.  Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs , 2009, 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia.

[3]  Albert Cohen,et al.  A Stream-Comptuting Extension to OpenMP , 2010, IWOMP 2010.

[4]  Michael Voss,et al.  Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5]  Guang R. Gao,et al.  An efficient pipelined dataflow processor architecture , 1988, Proceedings. SUPERCOMPUTING '88.

[6]  Rajiv Gupta,et al.  Exploiting parallelism on a fine-grained MIMD architecture based upon channel queues , 2005, International Journal of Parallel Programming.

[7]  Benjamin C. Pierce,et al.  Decoding Choice Encodings , 2000, Inf. Comput..

[8]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[9]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[10]  Anwar Ghuloum Future Proof Data Parallel Algorithms and Software on Intel Multicore Architecture , 2007 .

[11]  Ulrich Drepper,et al.  Futexes Are Tricky , 2004 .

[12]  Magnus Själander,et al.  A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[13]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[14]  Easwaran Raman,et al.  Parallel-stage decoupled software pipelining , 2008, CGO '08.

[15]  TripakisStavros,et al.  Modular code generation from synchronous block diagrams , 2009 .

[16]  Jean-Luc Gaudiot,et al.  Parallel Computing with the Sisal Applicative Language: Programmability and Performance Issues , 1996, Softw. Pract. Exp..

[17]  Cédric Augonnet,et al.  Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System , 2009, SAMOS.

[18]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[19]  I. Waston,et al.  A practical data flow computer , 1982 .

[20]  Sebastian Pop,et al.  Automatic streamization in GCC , 2009 .

[21]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[22]  Inmos Corp,et al.  Occam Programming Manual , 1984 .

[23]  Paul M. Carpenter,et al.  ACOTES Project: Advanced Compiler Technologies for Embedded Streaming , 2010, International Journal of Parallel Programming.

[24]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[25]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[26]  Pieter van der Wolf,et al.  TTL Hardware Interface: A High-Level Interface for Streaming Multiprocessor Architectures , 2006, 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Time Multimedia.

[27]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[28]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[29]  Pascal Raymond,et al.  The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.

[30]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[31]  Massimo Torquati,et al.  Efficient Smith-Waterman on Multi-core with FastFlow , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[32]  Jesús Labarta,et al.  CellSs: Making it easier to program the Cell Broadband Engine processor , 2007, IBM J. Res. Dev..

[33]  Paraskevas Evripidou,et al.  TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems , 2008, 2008 37th International Conference on Parallel Processing.

[34]  Robin Milner,et al.  A Calculus of Mobile Processes, II , 1992, Inf. Comput..

[35]  Edward A. Lee,et al.  A framework for comparing models of computation , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[36]  Andrei Sergeevich Terechko,et al.  A Hardware Task Scheduler for Embedded Video Processing , 2008, HiPEAC.

[37]  Paul M. Carpenter,et al.  A Streaming Machine Description and Programming Model , 2007, SAMOS.

[38]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[39]  Keshav Pingali,et al.  Efficient demand-driven evaluation. Part 1 , 1985, TOPL.

[40]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[41]  Marc Pouzet,et al.  Abstraction of Clocks in Synchronous Data-Flow Systems , 2008, APLAS.

[42]  Stavros Tripakis,et al.  Modular code generation from synchronous block diagrams: modularity vs. code size , 2009, POPL '09.

[43]  Eduard Ayguadé,et al.  Effective communication and computation overlap with hybrid MPI/SMPSs , 2010, PPoPP '10.

[44]  Paraskevas Evripidou,et al.  Data-Driven Multithreading Using Conventional Microprocessors , 2006, IEEE Transactions on Parallel and Distributed Systems.

[45]  Luc Maranget,et al.  Compiling Join-Patterns , 1998, Electron. Notes Theor. Comput. Sci..

[46]  Rudy Lauwereins,et al.  Cyclo-static data flow , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[47]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[48]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[49]  William Thies,et al.  Language and compiler support for stream programs , 2009 .

[50]  Albert Cohen,et al.  A stream-computing extension to OpenMP , 2011, HiPEAC.

[51]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[52]  Marc Pouzet,et al.  Synchronous objects with scheduling policies: introducing safe shared memory in lustre , 2009, LCTES '09.

[53]  Ben H. H. Juurlink,et al.  Parallel H.264 Decoding on an Embedded Multicore Processor , 2009, HiPEAC.

[54]  Sander Stuijk Concurrency in Computational Networks , 2002 .

[55]  Mario Tokoro,et al.  An Object Calculus for Asynchronous Communication , 1991, ECOOP.

[56]  Cédric Fournet,et al.  The reflexive CHAM and the join-calculus , 1996, POPL '96.

[57]  William J. Dally,et al.  A tuning framework for software-managed memory hierarchies , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[58]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[59]  Nancy M. Amato,et al.  STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.

[60]  Marc Pouzet,et al.  Synchronous Kahn networks , 1996, ICFP '96.

[61]  William Thies,et al.  An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[62]  Eduard Ayguadé,et al.  Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..