Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only)

The memory architecture has a significant effect on the flexibility and performance of a coarse-grained reconfigurable array (CGRA), which can be restrained due to configuration overhead and large latency of data transmission. Multi-context structure and data preloading method are widely used in popular CGRAs as a solution to bandwidth bottlenecks of context and data. However, these two schemes cannot balance the computing performance, area overhead, and flexibility. This paper proposed group-based context cache and multi-level data memory architectures to alleviate the bottleneck problems. The group-based context cache was designed to dynamically transfer and buffer context inside CGRA in order to relieve the off-chip memory access for contexts at runtime. The multi-level data memory was designed to add data memories to different CGRA hierarchies, which were used as data buffers for reused input data and intermediate data. The proposed memory architectures are efficient and cost-effective so that performance improvement can be achieved at the cost of minor area overhead. Experiments of H.264 video decoding program and scale invariant feature transform algorithm achieved performance improvements of 19% and 23%, respectively. Further, the complexity of the applications running on CGRA is no longer restricted by the capacity of the on-chip context memory, thereby achieving flexible configuration for CGRA. The memory architectures proposed in this paper were based on a generic CGRA architecture derived from the characteristics found in the majority of existing popular CGRAs. As such, they can be applied to universal CGRAs.

[1]  Hui Gao,et al.  Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System , 2013, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[2]  Andrea Lodi,et al.  A dynamically adaptive DSP for heterogeneous reconfigurable platforms , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Jürgen Becker,et al.  H. 264 Decoder at HD Resolution on a Coarse Grain Dynamically Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[4]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[5]  Hui Xu,et al.  A low power many-core SoC with two 32-core clusters connected by tree based NoC for multimedia applications , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[6]  Michalis D. Galanis,et al.  Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[7]  George A. Constantinides,et al.  A Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Liang-Gee Chen,et al.  A 59.5mW scalable/multi-view video decoder chip for Quad/3D Full HDTV and video streaming applications , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[9]  Aviral Shrivastava,et al.  High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Wenquan Feng,et al.  An architecture of optimised SIFT feature detection for an FPGA implementation of an image matcher , 2009, 2009 International Conference on Field-Programmable Technology.

[11]  Bingfeng Mei,et al.  Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[12]  Will Moffat,et al.  Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[13]  Dong Wang,et al.  An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.