论文信息 - Scalable-Grain Pipeline Parallelization Method for Multi-core Systems

Scalable-Grain Pipeline Parallelization Method for Multi-core Systems

How to parallelize the great amount of legacy sequential programs is the most difficult challenge faced by multi-core designers. The existing parallelization methods at the compile time due to the obscured data dependences in C are not suitable for exploring the parallelism of streaming applications. In this paper, a software pipeline for multi-layer loop method is proposed for streaming applications to exploit the coarse-grained pipeline parallelism hidden in multi-layer loops. The proposed method consists of three major steps: 1 transform the task dependence graph of a streaming application to resolve intricate dependence, 2 schedule tasks to multiprocessor system-on-chip with the objective of minimizing the maximal execution time of all pipeline stages, and 3 adjust the granularity of pipeline stages to balance the workload among all stages. The efficiency of the method is validated by case studies of typical streaming applications on multi-core embedded system.

[1] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2] William Thies,et al. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[3] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[4] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[5] Hsien-Hsin S. Lee,et al. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6] Koen De Bosschere,et al. The paralax infrastructure: Automatic parallelization with a helping hand , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[8] Gu-Yeon Wei,et al. HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.

[9] Wen-mei W. Hwu,et al. Automatic Discovery of Coarse-Grained Parallelism in Media Applications , 2007, Trans. High Perform. Embed. Archit. Compil..

[10] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.

[11] Rainer Leupers,et al. MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[12] Oliver Sinnen,et al. Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[13] Yingtao Jiang,et al. Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification , 2010 .

[14] Ron Cytron,et al. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[15] Oliver Sinnen,et al. Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing) , 2007 .

[16] Richard M. Stallman,et al. Using the GNU Compiler Collection , 2010 .

[17] Yun Zhang,et al. Decoupled software pipelining creates parallelization opportunities , 2010, CGO '10.

[18] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .