Scalable-Grain Pipeline Parallelization Method for Multi-core Systems

How to parallelize the great amount of legacy sequential programs is the most difficult challenge faced by multi-core designers. The existing parallelization methods at the compile time due to the obscured data dependences in C are not suitable for exploring the parallelism of streaming applications. In this paper, a software pipeline for multi-layer loop method is proposed for streaming applications to exploit the coarse-grained pipeline parallelism hidden in multi-layer loops. The proposed method consists of three major steps: 1 transform the task dependence graph of a streaming application to resolve intricate dependence, 2 schedule tasks to multiprocessor system-on-chip with the objective of minimizing the maximal execution time of all pipeline stages, and 3 adjust the granularity of pipeline stages to balance the workload among all stages. The efficiency of the method is validated by case studies of typical streaming applications on multi-core embedded system.

[1]  David I. August,et al.  Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[3]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[4]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[5]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Koen De Bosschere,et al.  The paralax infrastructure: Automatic parallelization with a helping hand , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[8]  Gu-Yeon Wei,et al.  HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.

[9]  Wen-mei W. Hwu,et al.  Automatic Discovery of Coarse-Grained Parallelism in Media Applications , 2007, Trans. High Perform. Embed. Archit. Compil..

[10]  Easwaran Raman,et al.  Parallel-stage decoupled software pipelining , 2008, CGO '08.

[11]  Rainer Leupers,et al.  MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[12]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[13]  Yingtao Jiang,et al.  Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification , 2010 .

[14]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[15]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing) , 2007 .

[16]  Richard M. Stallman,et al.  Using the GNU Compiler Collection , 2010 .

[17]  Yun Zhang,et al.  Decoupled software pipelining creates parallelization opportunities , 2010, CGO '10.

[18]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .