An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Power 3-D Graphics SoC

A low-power and high-performance 4-way 32-bit stream processor core is developed for handheld low-power 3-D graphics systems. It contains a floating-point unified matrix, vector, and elementary function unit. By exploiting the logarithmic arithmetic and the proposed adaptive number conversion scheme, a 4-way arithmetic unit achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way. The processor featured by this functional unit and several proposed architectural schemes including embedded register index calculations, functional unit reconfiguration, and operand forwarding in logarithmic domain achieves 19.1% cycle count reduction for OpenGL transformation and lighting (TnL) operation from the latest work. The proposed stream processor core is integrated into a 3-D graphics SoC as a vertex shader to show its effectiveness. The entire SoC is fabricated into a test chip using 1-poly 6-metal 0.18 mum CMOS technology. The 17.2 mm2 chip contains 1.57 M transistors and 29 kB SRAM. The stream processor core takes 9.7 mm2 and dissipates 86.8 mW at 200 MHz operating frequency. It shows a peak performance of 141 Mvertices/s for geometry transformation (TFM) and achieves 17.5% performance improvement and 44.7% and 39.4% power and area reductions for the TFM from the latest work. For power management of the SoC, the chip is divided into the triple power domains separately controlled by dynamic voltage and frequency scaling (DVFS). With this scheme, it shows 52.4 mW power consumption at 60 fps, 50.5% power reduction from the latest work.

[1]  Erik Lindholm,et al.  A user-programmable vertex engine , 2001, SIGGRAPH.

[2]  Hoi-Jun Yoo,et al.  A 155-mW 50-m vertices/s graphics processor with fixed-point programmable vertex shader for mobile applications , 2006, IEEE Journal of Solid-State Circuits.

[3]  Robert L. Cook,et al.  A Reflectance Model for Computer Graphics , 1987, TOGS.

[4]  Fumio Arakawa,et al.  An embedded processor core for consumer appliances with 2.8GFLOPS and 36M polygons/s FPU , 2004 .

[5]  Lee-Sup Kim,et al.  An SoC with 1.3 gtexels/s 3-D graphics full pipeline for consumer applications , 2006, IEEE Journal of Solid-State Circuits.

[6]  Miroslav N. Velev,et al.  Formal verification of an Intel XScale processor model with scoreboarding, specialized execution pipelines, and impress data-memory exceptions , 2003, First ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2003. MEMOCODE '03. Proceedings..

[7]  Hoi-Jun Yoo,et al.  A 52.4mW 3D Graphics Processor with 141Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[8]  Fumio Arakawa,et al.  An embedded processor core for consumer appliances with 2.8GFLOPS and 36M polygons/s FPU , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[9]  Hoi-Jun Yoo,et al.  A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains , 2007, GH '07.

[10]  H. Yoo,et al.  A 50 Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[11]  S.H. Dhong,et al.  A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor , 2006, IEEE Journal of Solid-State Circuits.

[12]  William J. Dally,et al.  Programmable Stream Processors , 2003, Computer.

[13]  Michael G. Strintzis,et al.  Optimized transmission of JPEG2000 streams over wireless channels , 2006, IEEE Transactions on Image Processing.

[14]  Lee-Sup Kim,et al.  A 120Mvertices/s multi-threaded VLIW vertex processor for mobile multimedia applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[15]  Shree K. Nayar,et al.  Generalization of Lambert's reflectance model , 1994, SIGGRAPH.

[16]  Hoi-Jun Yoo,et al.  A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems , 2006, IEEE Journal of Solid-State Circuits.

[17]  Hoi-Jun Yoo,et al.  A low-power vector processor using logarithmic arithmetic for handheld 3d graphics systems , 2007, ESSCIRC 2007 - 33rd European Solid-State Circuits Conference.

[18]  Fumio Arakawa,et al.  SH-X: an embedded processor core for consumer appliances , 2005, SIGARCH Comput. Archit. News.

[19]  Ching-Farn Eric Wu,et al.  A Hybrid Number System Processor with Geometric and Complex Arithmetic Capabilities , 1991, IEEE Trans. Computers.

[20]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[21]  Hoi-Jun Yoo,et al.  A fixed-point multimedia coprocessor with 50Mvertices/s programmable SIMD vertex shader for mobile applications , 2005, Proceedings of the 31st European Solid-State Circuits Conference, 2005. ESSCIRC 2005..