Configuration Cache Management for Coarse-Grained Reconfigurable Architecture with Multi-Array

Coarse-Grained Reconfigurable Architectures (CGRAs) can achieve both high performance and flexibility, and CGRAs with multi-array are used to meet the increasing performance requirement of multimedia applications. Meanwhile, the context size also becomes quite large, so many CGRAs use a configuration cache to reduce reconfiguration overhead. However, with high power consumption, configuration cache management is still a challenge. This paper first analyzes context features of media algorithms, and introduces the base hardware architecture. Then a configuration cache management technique is proposed to implement H.264 video decoding on the base architecture. It includes a novel configuration cache structure and a configuration cache replacement algorithm based on Context Sequence Prefetching & Priority (CSPP). The experimental results show that the proposed approach can drastically improve system performance and reduce power consumption. The average configuration cache hit rate of CSPP is 96.83%, the speedup ranges from 64% to 109%, and our approach can support H.264 1080p@30fps decoding at a 200MHz working frequency.

[1]  Yunheung Paek,et al.  Power-Conscious Configuration Cache Structure and Code Mapping for Coarse-Grained Reconfigurable Architecture , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[2]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[3]  Michel Dubois,et al.  Cache replacement algorithms with nonuniform miss costs , 2006, IEEE Transactions on Computers.

[4]  Zhiyuan Li,et al.  Configuration management techniques for reconfigurable computing , 2002 .

[5]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2004, The Journal of Supercomputing.

[6]  Hideharu Amano,et al.  Implementing and evaluating stream applications on the dynamically reconfigurable processor , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  Francky Catthoor,et al.  Memory hierarchy for high-performance and energyaware reconfigurable systems , 2007, IET Comput. Digit. Tech..

[8]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[9]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[10]  Mazen Kharbutli,et al.  Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms , 2010, 2010 IEEE International Conference on Computer Design.

[11]  Longxing Shi,et al.  Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications , 2012, IEICE Trans. Inf. Syst..

[12]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[13]  Lasse Natvig,et al.  An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches , 2006, MEDEA '06.

[14]  Rabi N. Mahapatra,et al.  Dynamic context management for low power coarse-grained reconfigurable architecture , 2009, GLSVLSI '09.