论文信息 - Branch-aware loop mapping on CGRAs

Branch-aware loop mapping on CGRAs

One of the challenges that all accelerators face, is to execute loops that have if-then-else constructs. There are three ways to accelerate loops with an if-then-else construct on a Coarse-grained reconfigurable architecture (CGRA): full predication, partial predication, and dual-issue scheme. In comparison with the other schemes, dual-issue scheme may achieve the best performance, but it requires compiler support - which does not exist. In this paper, we develop compiler techniques to map loops with conditionals on CGRA for the dual-issue scheme. Our experiments show: i) 40% of loops that can be accelerated on CGRA have conditionals, ii) The proposed dual-issue scheme enables our compiler to accelerate loops 40% faster than full predication scheme proposed in [12], and iii) Our compiler assisted dual issue scheme can exploit richer interconnects, if present.

Aviral Shrivastava | Sarma B. K. Vrudhula | Mahdi Hamzeh

[1] Rainer Leupers,et al. Handbook of Signal Processing Systems , 2010 .

[2] Nader Bagherzadeh,et al. A Modulo Scheduling Algorithm for a Coarse-Grain Reconfigurable Array Template , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[3] Kiyoung Choi,et al. Automatic mapping of application to coarse-grained reconfigurable architecture based on high-level synthesis techniques , 2008, 2008 International SoC Design Conference.

[4] Kiyoung Choi,et al. Compiling control-intensive loops for CGRAs with state-based full predication , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5] Michalis D. Galanis,et al. Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6] Jürgen Teich,et al. Regular mapping for coarse-grained reconfigurable architectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Carl Ebeling,et al. SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[8] Kiyoung Choi,et al. Mapping control intensive kernels onto coarse-grained reconfigurable array architecture , 2008, 2008 International SoC Design Conference.

[9] Rajesh Gupta,et al. Network topology exploration of mesh-based coarse-grain reconfigurable architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[10] Liang Chen,et al. Graph minor approach for application mapping on CGRAs , 2012, FPT.

[11] Scott Mahlke,et al. Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .

[12] Scott A. Mahlke,et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13] Aviral Shrivastava,et al. REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14] Bjorn De Sutter,et al. Coarse-Grained Reconfigurable Array Architectures , 2010, Handbook of Signal Processing Systems.

[15] SPEC CPU 2006 Benchmark Descriptions , 2006 .

[16] Aviral Shrivastava,et al. EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[17] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[18] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[19] Yunheung Paek,et al. A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[20] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.

[21] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[22] Kiyoung Choi,et al. Acceleration of control flow on CGRA using advanced predicated execution , 2010, 2010 International Conference on Field-Programmable Technology.

[23] Rudy Lauwereins,et al. Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[24] Kiyoung Choi,et al. Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA , 2013, TACO.