A reconfigurable memory architecture for system integration of coarse-grained reconfigurable arrays

Coarse-Grained Reconfigurable Arrays (CGRAs) have emerged as a powerful solution to speed up computationally intensive applications. Heterogeneous MPSoC architectures containing such reconfigurable accelerators have the advantage of providing greater flexibility, power-efficiency, and high performance. However, CGRAs may suffer from a data access bottleneck. To mitigate this problem, we present a reconfigurable memory architecture for CGRAs. Here, buffers can be configured at runtime to select between different schemes for memory access, i. e., random access memory or pixel buffers. We showcase the benefits to our approach by prototyping a heterogeneous MPSoC architecture containing a RISC processor and a class of CGRAs called tightly coupled processor arrays (TCPAs). The architecture is prototyped in FPGA technology. To communicate with up to 32 processing elements (PEs), the memory architecture utilizes less than 2.5% of slice registers and LUTs available in a Virtex-7 XC7V2000. For digital signal processing applications, we demonstrate that our solution for system integration allows increasing the memory bandwidth utilization in comparison to state-of-the-art solutions for image processing.

[1]  Yong Dou,et al.  Optimized Generation of Memory Structure in Compiling Window Operations onto Reconfigurable Hardware , 2007, ARC.

[2]  Jürgen Teich,et al.  PARO: Synthesis of Hardware Accelerators for Multi-Dimensional Dataflow-Intensive Applications , 2008, ARC.

[3]  Jürgen Teich,et al.  System integration of tightly-coupled processor arrays using reconfigurable buffer structures , 2013, CF '13.

[4]  Frank Hannig,et al.  Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..

[5]  Jürgen Teich,et al.  Partitioning Processor Arrays under Resource Constraints , 1997, J. VLSI Signal Process..

[6]  Jürgen Teich,et al.  The Erlangen slot machine: increasing flexibility in FPGA-based reconfigurable platforms , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[7]  Wayne Luk,et al.  The Coarse-Grained / Fine-Grained Logic Interface in FPGAs with Embedded Floating-Point Arithmetic Units , 2008, 2008 4th Southern Conference on Programmable Logic.

[8]  Jürgen Teich,et al.  Hierarchical power management for adaptive tightly-coupled processor arrays , 2013, TODE.

[9]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[10]  Srinivas Boppu Code Generation for Tightly Coupled Processor Arrays , 2015 .

[11]  Jürgen Teich,et al.  Dynamic Piecewise Linear/Regular Algorithms , 2004, Parallel Computing in Electrical Engineering, 2004. International Conference on.

[12]  Bjorn De Sutter,et al.  Architecture Enhancements for the ADRES Coarse-Grained Reconfigurable Array , 2008, HiPEAC.

[13]  D.I. Moldovan,et al.  On the design of algorithms for VLSI systolic arrays , 1983, Proceedings of the IEEE.

[14]  Jürgen Teich,et al.  Decentralized dynamic resource management support for massively parallel processor arrays , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[15]  Jürgen Teich,et al.  Hierarchical Partitioning for Piecewise Linear Algorithms , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).

[16]  Jürgen Teich,et al.  Power-Efficient Reconfiguration Control in Coarse-Grained Dynamically Reconfigurable Architectures , 2008, PATMOS.

[17]  Hannu Tenhunen,et al.  DyMeP: An Infrastructure to Support Dynamic Memory Binding for Runtime Mapping in CGRAs , 2015, 2015 28th International Conference on VLSI Design.

[18]  Jürgen Teich,et al.  A Dynamically Reconfigurable Weakly Programmable Processor Array Architecture Template , 2006, ReCoSoC.

[19]  Walid A. Najjar,et al.  Input data reuse in compiling window operations onto reconfigurable hardware , 2004, LCTES '04.

[20]  Jari Nurmi,et al.  A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations , 2010, J. Syst. Archit..

[21]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[22]  Jürgen Teich,et al.  Power Density-Aware Resource Management for Heterogeneous Tiled Multicores , 2017, IEEE Transactions on Computers.

[23]  Miriam Leeser,et al.  Automatic Sliding Window Operation Optimization for FPGA-Based , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[24]  Jürgen Teich,et al.  Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays: architectural parameters and methodology , 2006, Int. J. Embed. Syst..

[25]  Jack Jean,et al.  Data Buffering and Allocation in Mapping Generalized Template Matching on Reconfigurable Systems , 2004, The Journal of Supercomputing.

[26]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2004, The Journal of Supercomputing.

[27]  Donald G. Bailey,et al.  Adaptive Dynamic On-chip Memory Management for FPGA-based reconfigurable architectures , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).