论文信息 - A Fully Pipelined and Dynamically Composable Architecture of CGRA (Coarse Grained Reconfigurable Architecture)

A Fully Pipelined and Dynamically Composable Architecture of CGRA (Coarse Grained Reconfigurable Architecture)

Future processor will not be limited by the transistor resources, but will be mainly constrained by energy efficiency. Reconfigurable architecture offers higher energy efficiency than CPUs through customized hardware and more flexibility than ASICs. FPGAs allow configurability at bit level to keep both efficiency and flexibility. However, in many computation-intensive applications, only word level customizations are necessary, which inspires coarse-grained reconfigurable arrays(CGRAs) to raise configurability to word level and to reduce configuration information, and to enable on-the-fly customization. Traditional CGRAs are designed in the era when transistor resources are scarce. Previous work in CGRAs share hardware resources among different operations via modulo scheduling and time multiplexing processing elements. In the emerging scenario where transistor resources are rich, we develop a novel CGRA architecture that features full pipelining and dynamic composition to improve energy efficiency and implement the prototype on Xilinx Virtex-6 FPGA board. Experiments show that fully pipelined and dynamically composable architecture(FPCA) can exploit the energy benefits of customization for user applications when the transistor resources are rich.

Peipei Zhou

[1] André DeHon,et al. MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[2] Jean Vuillemin,et al. A reconfigurable arithmetic array for multimedia applications , 1999, FPGA '99.

[3] Jason Cong,et al. Pattern-based behavior synthesis for FPGA resource reduction , 2008, FPGA '08.

[4] Seth Copen Goldstein,et al. PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[5] Jason Cong,et al. A reuse-aware prefetching scheme for scratchpad memory , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6] Todor Stefanov,et al. pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..

[7] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.

[8] Abraham Waksman,et al. A Permutation Network , 1968, JACM.

[9] Georgi Gaydadjiev,et al. Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array , 2007, ARC.

[10] Reiner W. Hartenstein. Coarse grain reconfigurable architectures , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[11] Jonathan Rose,et al. Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12] Jason Cong,et al. Customizable Domain-Specific Computing , 2009, IEEE Design & Test of Computers.

[13] Tom Feist,et al. Vivado Design Suite , 2012 .

[14] H. Franke,et al. Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..

[15] Jason Cong,et al. Accelerator-rich CMPs: From concept to real hardware , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[16] Jason Cong,et al. CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.

[17] Fadi J. Kurdahi,et al. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[18] Rudy Lauwereins,et al. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[19] Gu-Yeon Wei,et al. The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.

[20] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21] Jason Cong,et al. Memory partitioning and scheduling co-optimization in behavioral synthesis , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[22] Scott A. Mahlke,et al. Scalable subgraph mapping for acyclic computation accelerators , 2006, CASES '06.

[23] Scott A. Mahlke,et al. Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24] Jason Cong,et al. Architecture support for accelerator-rich CMPs , 2012, DAC Design Automation Conference 2012.

[25] Rudy Lauwereins,et al. A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing , 2008, IEEE Micro.