Speedups and Energy Savings of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path

This paper presents the performance improvements and the energy reductions by coupling a high-performance coarse-grained reconfigurable data-path with a microprocessor in a generic platform. The datapath has been previously introduced by the authors. It is composed by computational units able to realize complex operations which aid in improving the performance of time critical application parts, called kernels. A design flow is proposed for mapping high-level software descriptions to the microprocessor system. Eight real-life applications are mapped on three different instances of the system. Significant overall application speedups, relative to a software-only solution, ranging from 1.74 to 3.94 are reported being close to theoretical speedup bounds. Average energy savings of 59% are achieved, while the reduction in the system energy-delay product ranges from 66% to 92%.

[1]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Sri Parameswaran,et al.  INSIDE: INstruction Selection/Identification & Design Exploration for extensible processors , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[3]  Bingfeng Mei,et al.  Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[4]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[5]  Jürgen Becker,et al.  Scalable processor instruction set extension , 2005, IEEE Design & Test of Computers.

[6]  Spyros Tragoudas,et al.  A Reconfigurable Coarse-grain Data-path for Accelerating Computational Intensive Kernels , 2005, J. Circuits Syst. Comput..

[7]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[8]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[9]  David G. Chinnery,et al.  Closing the power gap between ASIC and custom: an ASIC perspective , 2000, Proceedings. 42nd Design Automation Conference, 2005..

[10]  J. W. Crenshaw Math toolkit for real-time programming , 2000 .

[11]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[12]  BagherzadehNader,et al.  Automatic compilation to a coarse-grained reconfigurable system-opn-chip , 2003 .

[13]  Miodrag Potkonjak,et al.  Performance optimization using template mapping for datapath-intensive high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Rudy Lauwereins,et al.  Low Power Coarse-Grained Reconfigurable Instruction Set Processor , 2003, FPL.

[15]  Fadi J. Kurdahi,et al.  Automatic compilation to a coarse-grained reconfigurable system-opn-chip , 2003, TECS.

[16]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[17]  Kiyoung Choi,et al.  Design and Evaluation of a Coarse-Grained Reconfigurable Architecture , 2004 .

[18]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.