A Fully Pipelined and Dynamically Composable Architecture of CGRA

Future processor chips will not be limited by the transistor resources, but will be mainly constrained by energy efficiency. Reconfigurable fabrics bring higher energy efficiency than CPUs via customized hardware that adapts to user applications. Among different reconfigurable fabrics, coarse-grained reconfigurable arrays (CGRAs) can be even more efficient than fine-grained FPGAs when bit-level customization is not necessary in target applications. CGRAs were originally developed in the era when transistor resources were more critical than energy efficiency. Previous work shares hardware among different operations via modulo scheduling and time multiplexing of processing elements. In this work, we focus on an emerging scenario where transistor resources are rich. We develop a novel CGRA architecture that enables full pipelining and dynamic composition to improve energy efficiency by taking full advantage of abundant transistors. Several new design challenges are solved. We implement a prototype of the proposed architecture in a commodity FPGA chip for verification. Experiments show that our architecture can fully exploit the energy benefits of customization for user applications in the scenario of rich transistor resources.

[1]  Todor Stefanov,et al.  pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..

[2]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Scott A. Mahlke,et al.  Scalable subgraph mapping for acyclic computation accelerators , 2006, CASES '06.

[4]  Jason Cong,et al.  A reuse-aware prefetching scheme for scratchpad memory , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Abraham Waksman,et al.  A Permutation Network , 1968, JACM.

[6]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[7]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[8]  H. Franke,et al.  Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..

[9]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Gu-Yeon Wei,et al.  The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.

[11]  Jason Cong,et al.  Accelerator-rich CMPs: From concept to real hardware , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[12]  Jean Vuillemin,et al.  A reconfigurable arithmetic array for multimedia applications , 1999, FPGA '99.

[13]  André DeHon,et al.  MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[14]  Jason Cong,et al.  Pattern-based behavior synthesis for FPGA resource reduction , 2008, FPGA '08.

[15]  Jason Cong,et al.  Architecture support for accelerator-rich CMPs , 2012, DAC Design Automation Conference 2012.

[16]  Karthikeyan Sankaralingam,et al.  DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.

[17]  Jason Cong,et al.  Customizable Domain-Specific Computing , 2009, IEEE Design & Test of Computers.

[18]  Jason Cong,et al.  CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.

[19]  Reiner W. Hartenstein Coarse grain reconfigurable architectures , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[20]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.