Rotated parallel mapping: A novel approach for mapping data parallel applications on CGRAs

In this paper we present a new way of mapping data-parallel applications on coarse-grained reconfigurable architectures (CGRAs) to increase their performance. Traditional mapping approaches aim to map an application to a minimum number of contexts. In this work we gave up this idea. We propose to use the temporal domain with multiple contexts, as the preferred mapping domain. The benefit of this approach is that enough free resources are made accessible for a parallel execution of a datapath, which enables a higher utilization of a CGRA's resources, and thus a performance increase can be achieved. To show the validity of the proposed method, the speedup of various applications is evaluated using both, theoretical and experimental studies. The results show a performance improvement of up to 122% when compared to traditional application mapping techniques.

[1]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[2]  Wei Ge,et al.  A Data Prefetch and Reuse Strategy for Coarse-Grained Reconfigurable Architectures , 2013, IEICE Trans. Inf. Syst..

[3]  Tommy Kuhn,et al.  Cost Functions for the Design of Dynamically Reconfigurable Processor Architectures , 2004 .

[4]  Tommy Kuhn,et al.  Using Run-Time Reconfiguration to Implement Fault-Tolerant Coarse Grained Reconfigurable Architectures , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[5]  Rabi N. Mahapatra,et al.  Dynamic context management for low power coarse-grained reconfigurable architecture , 2009, GLSVLSI '09.

[6]  Julio A. de Oliveira Filho,et al.  CRC – Concepts and Evaluation of Processor-Like Reconfigurable Architectures (CRC – Konzepte und Bewertung prozessorartig rekonfigurierbarer Architekturen) , 2007, it Inf. Technol..

[7]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[10]  Reiner W. Hartenstein,et al.  Coarse grain reconfigurable architecture (embedded tutorial) , 2001, ASP-DAC '01.

[11]  Rudy Lauwereins,et al.  DRESC: a retargetable compiler for coarse-grained reconfigurable architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[12]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Gerard J. M. Smit,et al.  A Dataflow Inspired Programming Paradigm for Coarse-Grained Reconfigurable Arrays , 2014, ARC.

[14]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.