Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable Arrays

Coarse-grained reconfigurable arrays (CGRAs) are a promising class of architectures conjugating flexibility and efficiency. Devising effective methodologies to map applications onto CGRAs is a challenging task, due to their parallel execution paradigm and constrained hardware resources. In order to handle complex applications, it is important to devise efficient strategies to partition a kernel into pieces that obey resource constraint and methodologies to schedule them on the underlying hardware. In this paper, we tackle these problems by proposing algorithms to address partitioning based on recursive searches over abstract trees. A novel scheduling strategy is also described that, leveraging differences in delays of various operations, is able to efficiently map operations on CGRA architectures. Experimental evidence on kernels derived from a diverse set of data flow graphs and EEMBC benchmarks demonstrate the efficacy of the described methods, which, when combined, achieve a higher runtime performance on a given mesh size than state-of-the-art approaches (as much as 38% for the benchmark applications considered).

[1]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[2]  Nikil D. Dutt,et al.  Slack-aware scheduling on Coarse Grained Reconfigurable Arrays , 2011, 2011 Design, Automation & Test in Europe.

[3]  Fadi J. Kurdahi,et al.  Design and Implementation of the MorphoSys Reconfigurable Computing Processor , 2000, J. VLSI Signal Process..

[4]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[5]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2003, The Journal of Supercomputing.

[6]  Ieee Circuits,et al.  IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Paolo Ienne,et al.  Automatic application-specific instruction-set extensions under microarchitectural constraints , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[8]  Rudy Lauwereins,et al.  DRESC: a retargetable compiler for coarse-grained reconfigurable architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[9]  Kiyoung Choi,et al.  An algorithm for mapping loops onto coarse-grained reconfigurable architectures , 2003, LCTES '03.

[10]  Gerard J. M. Smit,et al.  Mapping of DSP algorithms on the MONTIUM architecture , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Kiyoung Choi,et al.  Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization , 2005, Design, Automation and Test in Europe.

[12]  Markus Weinhardt,et al.  XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture , 2002, FPL.

[13]  Tulika Mitra,et al.  Scalable custom instructions identification for instruction-set extensible processors , 2004, CASES '04.

[14]  Paolo Bonzini,et al.  Heterogeneous coarse-grained processing elements: A template architecture for embedded processing acceleration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Paolo Bonzini,et al.  Compiling custom instructions onto expression-grained reconfigurable architectures , 2008, CASES '08.

[16]  Scott A. Mahlke,et al.  Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures , 2006, CASES '06.

[17]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[18]  Kunle Olukotun,et al.  REMARC : Reconfigurable Multimedia Array Coprocessor , 1999 .

[19]  Li Jing,et al.  High-Level Synthesis Challenges and Solutions for a Dynamically Reconfigurable Processor , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[20]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Yunheung Paek,et al.  A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[22]  Dinesh Bhatia,et al.  Temporal Partitioning and Scheduling Data Flow Graphs for Reconfigurable Computers , 1999, IEEE Trans. Computers.

[23]  Georgi Gaydadjiev,et al.  Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array , 2007, ARC.

[24]  Charles Oliver Area efficient layouts of binary trees in grids , 2001 .

[25]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[26]  Ranga Vemuri,et al.  Optimal temporal partitioning and synthesis for reconfigurable architectures , 1998, Proceedings Design, Automation and Test in Europe.

[27]  Paolo Bonzini,et al.  EGRA: A Coarse Grained Reconfigurable Architectural Template , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[28]  Martin D. F. Wong,et al.  Network flow based circuit partitioning for time-multiplexed FPGAs , 1998, ICCAD '98.

[29]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[30]  Scott A. Mahlke,et al.  CGRA express: accelerating execution using dynamic operation fusion , 2009, CASES '09.

[31]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[32]  Nader Bagherzadeh,et al.  A Modulo Scheduling Algorithm for a Coarse-Grain Reconfigurable Array Template , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[33]  Paolo Bonzini,et al.  Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays , 2008, 2008 Symposium on Application Specific Processors.

[34]  Rudy Lauwereins,et al.  Low Power Coarse-Grained Reconfigurable Instruction Set Processor , 2003, FPL.