Branch-aware loop mapping on CGRAs

One of the challenges that all accelerators face, is to execute loops that have if-then-else constructs. There are three ways to accelerate loops with an if-then-else construct on a Coarse-grained reconfigurable architecture (CGRA): full predication, partial predication, and dual-issue scheme. In comparison with the other schemes, dual-issue scheme may achieve the best performance, but it requires compiler support - which does not exist. In this paper, we develop compiler techniques to map loops with conditionals on CGRA for the dual-issue scheme. Our experiments show: i) 40% of loops that can be accelerated on CGRA have conditionals, ii) The proposed dual-issue scheme enables our compiler to accelerate loops 40% faster than full predication scheme proposed in [12], and iii) Our compiler assisted dual issue scheme can exploit richer interconnects, if present.

[1]  Rainer Leupers,et al.  Handbook of Signal Processing Systems , 2010 .

[2]  Nader Bagherzadeh,et al.  A Modulo Scheduling Algorithm for a Coarse-Grain Reconfigurable Array Template , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[3]  Kiyoung Choi,et al.  Automatic mapping of application to coarse-grained reconfigurable architecture based on high-level synthesis techniques , 2008, 2008 International SoC Design Conference.

[4]  Kiyoung Choi,et al.  Compiling control-intensive loops for CGRAs with state-based full predication , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Michalis D. Galanis,et al.  Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  Jürgen Teich,et al.  Regular mapping for coarse-grained reconfigurable architectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[8]  Kiyoung Choi,et al.  Mapping control intensive kernels onto coarse-grained reconfigurable array architecture , 2008, 2008 International SoC Design Conference.

[9]  Rajesh Gupta,et al.  Network topology exploration of mesh-based coarse-grain reconfigurable architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[10]  Liang Chen,et al.  Graph minor approach for application mapping on CGRAs , 2012, FPT.

[11]  Scott Mahlke,et al.  Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .

[12]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2010, Handbook of Signal Processing Systems.

[15]  SPEC CPU 2006 Benchmark Descriptions , 2006 .

[16]  Aviral Shrivastava,et al.  EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[17]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[18]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[19]  Yunheung Paek,et al.  A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[20]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[21]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[22]  Kiyoung Choi,et al.  Acceleration of control flow on CGRA using advanced predicated execution , 2010, 2010 International Conference on Field-Programmable Technology.

[23]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[24]  Kiyoung Choi,et al.  Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA , 2013, TACO.