Stream Compilation for Real-Time Embedded Multicore Systems

Multicore systems have not only become ubiquitous in the desktop and server worlds, but are also becoming the standard in the embedded space. Multicore offers programability and flexibility over traditional ASIC solutions. However, many of the advantages of switching to multicore hinge on the assumption that software development is simpler and less costly than hardware development. However, the design and development of correct, high-performance, multi-threaded programs is a difficult challenge for most programmers. Stream programming is one model that has wide applicability in the multimedia, signal processing, and networking domains. Streaming is convenient for developers because it separates the creation of actors, or functions that operate on packets of data, from the flow of data through the system. However, stream compilers are generally ineffective for embedded systems because they do not handle strict resource or timing constraints.  Specifically, real-time deadlines and memory size limitations are not handled by conventional stream partitioning and scheduling techniques. This paper introducesthe SPIR compiler that orchestrates the execution of streaming applications with strict memory and timing constraints. Software defined radio or SDR is chosen as the application space to illustrate the effectiveness of the compiler for mapping applications onto the IBM Cell platform.

[1]  Michael K. Chen,et al.  Shangri-La: achieving high performance from compiled network applications while enabling ease of programming , 2005, PLDI '05.

[2]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[3]  Guang R. Gao,et al.  Optimal Modulo Scheduling Through Enumeration , 2004, International Journal of Parallel Programming.

[4]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[5]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[6]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[7]  Krste Asanovic,et al.  Compiling for vector-thread architectures , 2008, CGO '08.

[8]  Calton Pu,et al.  Spidle: A DSL Approach to Specifying Streaming Applications , 2003, GPCE.

[9]  Shuvra S. Bhattacharyya,et al.  Heterogeneous Design in Functional DIF , 2008, SAMOS.

[10]  Hong Song,et al.  A Programming Model for an Embedded Media Processing Architecture , 2005, SAMOS.

[11]  Soonhoi Ha,et al.  PeaCE: A hardware-software codesign environment for multimedia embedded systems , 2008, TODE.

[12]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[13]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[14]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[15]  R. Ferreira,et al.  Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[16]  Scott A. Mahlke,et al.  From SODA to scotch: The evolution of a wireless baseband processor , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[17]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[18]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[19]  Zhaohui Du,et al.  Data and computation transformations for Brook streaming applications on multiprocessors , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[20]  Aaron Smith,et al.  Compiling for EDGE architectures , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[21]  William J. Dally,et al.  Compiling for stream processing , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[23]  Guang R. Gao,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[24]  William Thies,et al.  Phased scheduling of stream programs , 2003, LCTES '03.

[25]  S.S. Bhattacharyya,et al.  A hierarchical multiprocessor scheduling system for DSP applications , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[26]  Michael Gschwind,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..

[27]  Koichi Awazu,et al.  Programming and performance evaluation of the cell processor , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).

[28]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[29]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.