Architecture Exploration of Standard-Cell and FPGA-Overlay CGRAs Using the Open-Source CGRA-ME Framework

We describe an open-source software framework,CGRA-ME, for the modeling and exploration of coarse-grained reconfigurable architectures (CGRAs). CGRAs are programmable hardware devices having large ALU-like logic blocks, and datapath bus-style inter-connect. CGRAs are positioned between fine-grained FPGAs and standard-cell ASICs on the spectrum of programmability - they are less flexible than FPGAs, yet are more flexible than ASICs. With CGRA-ME, an architect can describe a CGRA architecture in an XML-based language. The framework also allows the architect to map benchmarks onto the architecture and provides automatic generation of Verilog RTL for the modeled architecture. This allows the architect to simulate for verification purposes, and perform synthesis to either an ASIC or FPGA-overlay implementation of the CGRA, assessing performance, area, and power consumption. In an experimental study, we use CGRA-ME to model, map benchmarks onto, and evaluate several variants of a widely known CGRA, considering both standard-cell and FPGA-overlay physical realizations of the CGRA.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Dong Nguyen,et al.  Optimizing stream program performance on CGRA-based systems? , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[4]  Liang Chen,et al.  Graph minor approach for application mapping on CGRAs , 2012, FPT.

[5]  Russell Tessier,et al.  Reconfigurable Computing Architectures , 2015, Proceedings of the IEEE.

[6]  Lu Ma,et al.  A Graph-Based Spatial Mapping Algorithm for a Coarse Grained Reconfigurable Architecture Template , 2011 .

[7]  Julio A. de Oliveira Filho,et al.  CGADL: An Architecture Description Language for Coarse-Grained Reconfigurable Arrays , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Kiyoung Choi,et al.  Mapping Multi-Domain Applications Onto Coarse-Grained Reconfigurable Architectures , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Tarek S. Abdelrahman,et al.  Tile-based bottom-up compilation of custom mesh-of-functional-units FPGA overlays , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[11]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[12]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[13]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[14]  Jason Helge Anderson,et al.  CGRA-ME: A unified framework for CGRA modelling and exploration , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[15]  Hideharu Amano,et al.  A Survey on Dynamically Reconfigurable Processors , 2006, IEICE Trans. Commun..

[16]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[17]  Rudy Lauwereins,et al.  DRESC: a retargetable compiler for coarse-grained reconfigurable architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[18]  Jung Ho Ahn,et al.  NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[19]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2018, Handbook of Signal Processing Systems.

[20]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[21]  Aviral Shrivastava,et al.  A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Karthikeyan Sankaralingam,et al.  A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.

[23]  Cao Liang,et al.  SmartCell: A power-efficient reconfigurable architecture for data streaming applications , 2008, 2008 IEEE Workshop on Signal Processing Systems.

[24]  Tarek S. Abdelrahman,et al.  A high-performance overlay architecture for pipelined execution of data flow graphs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[25]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[26]  Reiner W. Hartenstein Coarse grain reconfigurable architectures , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[27]  Koichiro Furuta,et al.  Optimizing time and space multiplexed computation in a dynamically reconfigurable processor , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[28]  Mario Konijnenburg,et al.  ULP-SRP: Ultra low power Samsung Reconfigurable Processor for biomedical applications , 2012, 2012 International Conference on Field-Programmable Technology.