Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200MHz clock frequency. The REMUS-II is implemented into 23.7mm2 silicon on TSMC 65nm logic process with a 400MHz maximum working frequency.

[1]  William J. Dally,et al.  Media processing applications on the Imagine stream processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[2]  Kunle Olukotun,et al.  A quantitative analysis of reconfigurable coprocessors for multimedia applications , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[3]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[4]  Jonah Probell Architecture Considerations for Multi-Format Programmable Video Processors , 2008, J. Signal Process. Syst..

[5]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[6]  Reiner W. Hartenstein Coarse grain reconfigurable architecture (embedded tutorial) , 2001, ASP-DAC '01.

[7]  David A. Patterson,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[8]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[10]  Marco Platzner,et al.  Zippy - a coarse-grained reconfigurable array with support for hardware virtualization , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[11]  Andrea Lodi,et al.  A dynamically adaptive DSP for heterogeneous reconfigurable platforms , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Roberto Guerrieri,et al.  RTL-to-layout implementation of an embedded coarse grained architecture for dynamically reconfigurable computing in systems-on-chip , 2009, 2009 International Symposium on System-on-Chip.

[13]  Paolo Bonzini,et al.  EGRA: A Coarse Grained Reconfigurable Architectural Template , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2004, The Journal of Supercomputing.

[15]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[16]  Leibo Liu,et al.  A Cycle-Accurate Simulator for a Reconfigurable Multi-Media System , 2010, IEICE Trans. Inf. Syst..

[17]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[18]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[19]  Chunming Qiao Efficient matrix operations in a reconfigurable array with spanning optical buses , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.