Extracting Coarse-Grained Pipelined Parallelism Out of Sequential Applications for Parallel Processor Arrays

We present development and runtime support for building application specific data processing pipelines out of sequential code, and for executing them on a general purpose platform that features a reconfigurable Parallel Processor Array (PPA). Our approach is to let the programmer annotate the source of the application to indicate the desired pipeline stages and associated data flow, with little code restructuring. A pre-processor is then used to transform the annotated program into different code segments according to the indicated pipeline structure, generate the corresponding executable code, and produce a bundled application package containing all executables and deployment information for the target platform. There are special mechanisms for setting up the application-specific pipeline structure on the PPA and achieving integrated execution in the context of a general-purpose operating system, enabling the pipelined application to access the usual system peripherals and run concurrently with other conventional programs. To verify our approach, we have built a prototype system using soft processor arrays on an embedded FPGA platform, and transformed a well-known application into a pipelined version that executes successfully on our prototype.

[1]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[2]  Ioannis Papaefstathiou,et al.  A Fast FPGA-Based 2-Opt Solver for Small-Scale Euclidean Traveling Salesman Problem , 2007 .

[3]  Yun Zhang,et al.  Revisiting the Sequential Programming Model for Multi-Core , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[4]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[5]  Spyros Lalis,et al.  System- and Application-level Support for Runtime Hardware Reconfiguration on SoC Platforms , 2006, USENIX Annual Technical Conference, General Track.

[6]  Michael R. Butts,et al.  A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing , 2007 .

[7]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[8]  Timothy G. Mattson,et al.  How good is OpenMP , 2003, Sci. Program..

[9]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[10]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[11]  Michael Gschwind Chip multiprocessing and the cell broadband engine , 2006, CF '06.

[12]  Michael Gschwind,et al.  An Open Source Environment for Cell Broadband Engine System Software , 2007, Computer.

[13]  Frank Mueller,et al.  A Library Implementation of POSIX Threads under UNIX , 1993, USENIX Winter.

[14]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).