An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications

In this paper, we introduce a coarse-grained dynamically reconfigurable fabric, named Reconfigurable Processing Unit (RPU), which is implemented on a 5.4×3.1 mm2 silicon with TSMC 65 nm LP1P8M technology. This fabric consists of 16×16 multi-functional Processing Elements (PEs) interconnected by an area-efficient Line-Switched Mesh Connect (LSMC) routing. A Hierarchical Configuration Context (HCC) organization scheme is proposed to reduce the scale of the context memory and enhance configuration efficiency. Two reconfigurable processors are then designed and fabricated to verify the proposed techniques. One processor (called REMUS_HPP) integrates two RPUs, targeting the high performance applications. REMUS_HPP could decode 1920×1080@30fps H.264 streams with 280mW under 200MHz, achieving a performance gain of 1.81x and a 14.3x energy efficiency improvement over XPP-III. The other processor (called REMUS_LPP) integrates only one RPU, targeting the low power applications. REMUS_LPP could decode 720×480@35fps H.264 streams with 24.81mW under 75MHz, achieving a 76% power reduction and a 3.96x energy efficiency improvement compared with ADRES. More importantly, RPU is not only limited to video decoding applications. It can also be used to process some other computation-intensive applications and the corresponding analysis is given in this paper as well.

[1]  Eduardo Juárez Martínez,et al.  A DSP Based H.264 Decoder for a Multi-Format IP Set-Top Box , 2008, IEEE Transactions on Consumer Electronics.

[2]  Hui Gao,et al.  Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System , 2013, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[3]  Hui Xu,et al.  A low power many-core SoC with two 32-core clusters connected by tree based NoC for multimedia applications , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[4]  Leibo Liu,et al.  Compiler Framework for Reconfigurable Computing Architecture , 2009, IEICE Trans. Electron..

[5]  Longxing Shi,et al.  Fast AdaBoost-Based Face Detection System on a Dynamically Coarse Grain Reconfigurable Architecture , 2012, IEICE Trans. Inf. Syst..

[6]  Leibo Liu,et al.  Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture , 2012, IEICE Trans. Inf. Syst..

[7]  Yutaka Arakawa,et al.  A novel traffic engineering method using on-chip diorama network on dynamically reconfigurable processor DAPDNA-2 , 2009, 2009 International Conference on High Performance Switching and Routing.

[8]  Longxing Shi,et al.  Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications , 2012, IEICE Trans. Inf. Syst..

[9]  Jürgen Becker,et al.  H. 264 Decoder at HD Resolution on a Coarse Grain Dynamically Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[10]  Roberto Guerrieri,et al.  A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing , 2010, IEEE Journal of Solid-State Circuits.

[11]  Leibo Liu,et al.  Polyhedral model based mapping optimization of loop nests for CGRAs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Bjorn De Sutter,et al.  Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder , 2008, J. Signal Process. Syst..