A VLIW Processor With Hardware Functions: Increasing Performance While Reducing Power

This brief presents a heterogeneous multicore embedded processor architecture designed to exceed performance of traditional embedded processors while reducing the power consumed compared to low-power embedded processors. At the heart of this architecture is a multicore very large instruction word (VLIW) containing homogeneous execution cores/functional units. Additionally, heterogeneous combinational hardware function cores are tightly integrated to the VLIW core providing an opportunity for improved performance and reduced energy consumption. Our processor has been synthesized for both a 90-nm Stratix II field-programmable gate array and a 160-nm cell-based application-specific integrated circuit from Oki each operating at a core frequency of 167 MHz. For selected multimedia and signal processing benchmarks, we show how this processor provides kernel performance improvements averaging 179X over an Intel StrongARM and 36X over an Intel XScale leading to application speedups averaging 30X over StrongARM and 10X over XScale

[1]  Herman Schmit,et al.  Efficient application representation for HASTE: Hybrid Architectures with a Single, Transformable Executable , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[2]  Stamatis Vassiliadis,et al.  An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[3]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  William J. Dally,et al.  VLSI design and verification of the Imagine processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[5]  Viraphol Chaiyakul,et al.  High-Level Transformations for Minimizing Syntactic Variances , 1993, 30th ACM/IEEE Design Automation Conference.

[6]  K.J. O'Connor,et al.  Design issues for very-long-instruction-word VLSI video signal processors , 1996, VLSI Signal Processing, IX.

[7]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[8]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[9]  Sharad Malik,et al.  The design of dynamically reconfigurable datapath coprocessors , 2004, TECS.

[10]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[11]  A. Tsai,et al.  PipeRench: A virtualized programmable datapath in 0.18 micron technology , 2002, Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285).

[12]  André DeHon,et al.  MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[13]  Alex K. Jones,et al.  Extracting speedup from C-code with poor instruction-level parallelism , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[14]  Alex K. Jones,et al.  A 64-way VLIW/SIMD FPGA architecture and design flow , 2004, Proceedings of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems, 2004. ICECS 2004..

[15]  Alex K. Jones,et al.  Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions , 2006, EURASIP J. Adv. Signal Process..

[16]  Tarek S. Abdelrahman,et al.  A multilevel computing architecture for embedded multimedia applications , 2004, IEEE Micro.

[17]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[18]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[19]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[20]  Gerhard Weikum,et al.  The Atomic Manifesto , 2005, J. Univers. Comput. Sci..

[21]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[22]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.