A stream compiler for communication-exposed architectures

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.

[1]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[2]  Stephanie Seneff Speech Transformation System (Spectrum and/or Excitation) without Pitch Extraction. , 1980 .

[3]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[4]  David May,et al.  Communicating Process Architecture: Transputers and Occam , 1986, Future Parallel Computers.

[5]  Paul Le Guernic,et al.  SIGNAL: A declarative language for synchronous programming of real-time systems , 1987, FPCA.

[6]  Inmos Limited,et al.  OCCAM 2 reference manual , 1988 .

[7]  Shekhar Y. Borkar,et al.  iWarp: an integrated solution to high-speed parallel computing , 1988, Proceedings. SUPERCOMPUTING '88.

[8]  D.R. O'Hallaron,et al.  The Assign Parallel Program Generator , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[9]  Pascal Raymond,et al.  The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.

[10]  Gérard Berry,et al.  The Esterel Synchronous Programming Language: Design, Semantics, Implementation , 1992, Sci. Comput. Program..

[11]  Jean A. Peperstraete,et al.  Cycle-static dataflow , 1996, IEEE Trans. Signal Process..

[12]  Edward A. Lee,et al.  Software Synthesis from Dataflow Graphs , 1996 .

[13]  Todd A. Proebsting,et al.  Filter fusion , 1996, POPL '96.

[14]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[15]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[16]  Jean-Luc Gaudiot,et al.  The Sisal model of functional programming and its implementation , 1997, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis.

[17]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[18]  David R. O'Hallaron,et al.  iWARP: Anatomy of a Parallel Computing System , 1998 .

[19]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[20]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[21]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE) Extended Abstract , 2000 .

[22]  W. Dally,et al.  Stream Scheduling , 2001 .

[23]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[24]  Karthikeyan Sankaralingam,et al.  A Technology-Scalable Architecture for Fast Clocks and High ILP , 2001 .

[25]  Edward A. Lee,et al.  Overview of the Ptolemy project , 2001 .

[26]  Henry Hoffmann,et al.  StreamIt: A Compiler for Streaming Applications ⁄ , 2002 .

[27]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[28]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[29]  Anant Agarwal,et al.  Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..