Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture

Using the coarser operand grain and simplified interconnection patterns, CGRA (coarse grained reconfigurable architectures) has been proven to be energy efficient in several specific domains. As we know, the speed at which the contexts are applied to a PEA (processing element array) directly determines the performance of CGRA. In this paper, the design space in CGRA is further developed from the configuration granularity perspective by one middle-grained configuration granularity—the row-based configuration mechanism (RCM). The most prominent feature of the RCM is that a large DFG (data flow graph) can be mapped onto a small array in once reconfiguration, which is carried out on a row-by-row basis. Compared with an ordinary DFG-partitioning solution, the reconfiguration time and the data transfer time are well reduced. Furthermore, the proposed RCM offers much more efficient storage for the contexts. Compared with the DFG partitioning solution, the performance is boosted from 2.6% to 57.8%, while the area penalty is only 4.79% and the power penalty is only 7.22%. The RCM has been used in one reconfigurable processor called REMUS HPA (reconfigurable multi-media system, high performance version advanced). REMUS HPA has been implemented on a 50.5 mm2 silicon with TSMC 65 nm technology. Simulation shows that 1920×1088@37 fps can be achieved for H.264 high-profile decoding when exploiting a 200 MHz working frequency. Compared with the high performance version of XPP (one commercial reconfigurable processor), the performance is 247% boosted.

[1]  Carl Ebeling,et al.  Implementing an OFDM receiver on the RaPiD reconfigurable architecture , 2004, IEEE Transactions on Computers.

[2]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[3]  Andrea Lodi,et al.  A dynamically adaptive DSP for heterogeneous reconfigurable platforms , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Liesbet Van der Perre,et al.  Mapping of 40 MHz MIMO SDM-OFDM Baseband Processing on Multi-Processor SDR Platform , 2008, 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems.

[5]  Fadi J. Kurdahi,et al.  Design and Implementation of the MorphoSys Reconfigurable Computing Processor , 2000, J. VLSI Signal Process..

[6]  Bjorn De Sutter,et al.  Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder , 2008, J. Signal Process. Syst..

[7]  Hideharu Amano,et al.  A cost-effective context memory structure for dynamically reconfigurable processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[8]  Bingfeng Mei,et al.  Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[9]  Jos Huisken,et al.  RECONFIGURABLE ACCELERATORS ENABLING EFFICIENT SDR FOR LOW-COST CONSUMER DEVICES , 2010 .

[10]  Longxing Shi,et al.  Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications , 2012, IEICE Trans. Inf. Syst..

[11]  Tom Vander Aa,et al.  Mapping of the AES cryptographic algorithm on a Coarse-Grain reconfigurable array processor , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[12]  Roberto Guerrieri,et al.  A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing , 2010, IEEE Journal of Solid-State Circuits.

[13]  Hideharu Amano,et al.  Instruction buffer mode for multi-context Dynamically Reconfigurable Processors , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[14]  Philip Machanick,et al.  Dynamic Cache Switching in Reconfigurable Embedded Systems , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[15]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2018, Handbook of Signal Processing Systems.

[16]  Longxing Shi,et al.  Fast AdaBoost-Based Face Detection System on a Dynamically Coarse Grain Reconfigurable Architecture , 2012, IEICE Trans. Inf. Syst..

[17]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[18]  V. Derudder,et al.  Mapping a multiple antenna SDM-OFDM receiver on the ADRES coarse-grained reconfigurable processor , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[19]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[20]  Reiner W. Hartenstein,et al.  Mapping Applications onto Reconfigurable Kress Arrays , 1999, FPL.

[21]  Nikil D. Dutt,et al.  Integrating Physical Constraints in HW-SW Partitioning for Architectures With Partial Dynamic Reconfiguration , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Pedro C. Diniz,et al.  Compiling for reconfigurable computing: A survey , 2010, CSUR.

[23]  Andrea Lodi,et al.  A Multi-Context Pipelined Array for Embedded Systems , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[24]  Fadi J. Kurdahi,et al.  A framework for reconfigurable computing: task scheduling and context management , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[25]  Longxing Shi,et al.  Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications , 2012, IEICE Trans. Inf. Syst..

[26]  Jürgen Becker,et al.  H. 264 Decoder at HD Resolution on a Coarse Grain Dynamically Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.