Compilation of stream programs for multicore processors that incorporate scratchpad memories

The stream processing characteristics of many embedded system applications in multimedia and networking domains have led to the advent of stream based programming formats. Several multicore processors aimed at embedded domains incorporate scratchpad memories (SPM) due to their superior power consumption characteristics. The paper addresses the problem of compiling stream programs on to multi-core processors that incorporate SPM. Performance optimization on SPM based processors requires effective schemes for software based management of code and/or data overlay. In the context of our problem instance the code overlay scheme impacts both the stream element to core mapping and memory available for inter-processor communication. The paper presents an integer linear programming (ILP) formulation and heuristic approach that effectively exploit the SPM to maximize the throughput of stream programs when mapped to multicore processors. The experimental results demonstrate the effectiveness of the proposed techniques by compiling StreamIt based benchmark applications on the IBM Cell processor and comparing against existing approach.

[1]  William J. Dally,et al.  Compilation for explicitly managed memory hierarchies , 2007, PPOPP.

[2]  Erik J. Johnson,et al.  IXP2400/2800 Programming: The Complete Microengine Coding Guide , 2003 .

[3]  Edward A. Lee,et al.  A HIERARCHICAL MULTIPROCESSOR SCHEDULING FRAMEWORK FOR SYNCHRONOUS DATAFLOW GRAPHS , 1995 .

[4]  Henry Hoffmann,et al.  StreamIt: A Compiler for Streaming Applications ⁄ , 2002 .

[5]  Zhaohui Du,et al.  Data and computation transformations for Brook streaming applications on multiprocessors , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[6]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[7]  Michael K. Chen,et al.  Shangri-La: achieving high performance from compiled network applications while enabling ease of programming , 2005, PLDI '05.

[8]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[9]  Edward A. Lee,et al.  A Hierarchical Multiprocessor Scheduling Framework for , 1999 .

[10]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[11]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[12]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[13]  Michael D. McCool,et al.  Shader metaprogramming , 2002, HWWS '02.

[14]  Krishnan Srinivasan,et al.  ILP and heuristic techniques for system-level design on network processor architectures , 2007, TODE.

[15]  Alexandros Stamatakis,et al.  Dynamic multigrain parallelization on the cell broadband engine , 2007, PPoPP.

[16]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[17]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[18]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.