A lightweight streaming layer for multicore execution

As multicore architectures gain widespread use, it becomes increasingly important to be able to harness their additional processing power to achieve higher performance. However, exploiting parallel cores to improve single-program performance is difficult from a programmer's perspective because most existing programming languages dictate a sequential method of execution. Stream programming, which organizes programs by independent filters communicating over explicit data channels, exposes useful types of parallelism that can be exploited. However, there is still the burden of mapping high-level stream programs to specific multicore architectures. The complexities of each architecture's underlying details makes it difficult to schedule the execution of a stream program with high performance. In this paper, we present the specifications for an intermediate layer between the stream program and the target architecture. This multicore streaming layer (MSL) provides a common level of abstraction that facilitates efficient execution of stream programs by making it easier for compilers to manage computation, and by providing automatic orchestration and optimization of communication when appropriate. We implemented a framework for one such instance of the MSL targeted to the Cell processor and the StreamIt language and achieved greater than 88% utilization on all benchmarks with relatively small amounts of code. The framework can also be applied to other architectures and stream programming languages to enhance generality and portability.

[1]  Christoforos E. Kozyrakis,et al.  The stream virtual machine , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  William J. Dally,et al.  Programmable Stream Processors , 2003, Computer.

[3]  William J. Dally,et al.  Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.

[4]  Mendel Rosenblum,et al.  Stream programming on general-purpose processors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[5]  Jack Dongarra,et al.  SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 , 2007 .

[6]  Xin David Zhang,et al.  A Streaming Computation Framework for the Cell Processor , 2007 .

[7]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[8]  Hiroshi Inoue,et al.  MPI microtask for programming the Cell Broadband Enginee , 2006 .

[9]  M. J. Prelle,et al.  MultiCore Framework: An API for Programming Heterogeneous Multicore Processors , 2006 .

[10]  M. McCool Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform , 2006 .

[11]  Anant Agarwal,et al.  Energy scalability of on-chip interconnection networks , 2007 .

[12]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[13]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[14]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[16]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[17]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[18]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.