A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures

In the past decade, most solutions concerning the mapping of the compute-intensive loop kernels to accelerators have used heuristics and compiler-based strategies. These facts require that most of the decisions be taken at design time, thus precluding efficient solutions that can take run-time information into account. Any success in accelerating such applications greatly depends on two steps, extracting the loops and mapping them into the architecture. This last step is a challenge in itself since it is a NP-complete problem. In this paper, we propose a runtime solution that can provide speed ups of 3 to 6 orders of magnitude for the mapping step when compared to the state-of-the-art at minimal performance degradation, by the combined usage of 3 distinct mechanisms: 1) a simple and efficient modulo scheduling heuristic, 2) a crossbar network, which simplifies the placement and routing, 3) a virtual coarse-grained reconfigurable architecture (CGRA). Additionally, since the CGRA is a virtual layer on top of an FPGA, it is possible to use any off-the-shelf FPGA without the need of special tools or IP solutions. Although the mapping is NP-complete even for crossbar-based CGRAs, experimental results demonstrate a huge reduction in compilation time, as opposed to previous solutions that require seconds to map the applications, our solution requires only microseconds to find near optimal schedules. Besides the speed up, the proposed solution enables the use of just-in-time compilation, hence it is intrinsically adaptive to a changing scenario.

[1]  Scott A. Mahlke,et al.  Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures , 2009, LCTES '09.

[2]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[3]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Miodrag Potkonjak,et al.  An integrated system for rapid prototyping of high performance algorithm specific data paths , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[5]  Yunheung Paek,et al.  A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[6]  Stephan Wong,et al.  The Delft Reconfigurable VLIW Processor , 2010 .

[7]  Nader Bagherzadeh,et al.  A Modulo Scheduling Algorithm for a Coarse-Grain Reconfigurable Array Template , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[8]  Aviral Shrivastava,et al.  EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[9]  Liang Chen,et al.  Graph minor approach for application mapping on CGRAs , 2012, 2012 International Conference on Field-Programmable Technology.

[10]  Scott A. Mahlke,et al.  Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Luigi Carro,et al.  An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture , 2011, 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES).

[12]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[13]  Rudy Lauwereins,et al.  DRESC: a retargetable compiler for coarse-grained reconfigurable architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[14]  Georgi Gaydadjiev,et al.  Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array , 2007, ARC.

[15]  Scott A. Mahlke,et al.  Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures , 2006, CASES '06.

[16]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[17]  Bjorn De Sutter,et al.  Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays , 2008, LCTES '08.