A 2.05 GVertices/s 151 mW Lighting Accelerator for 3D Graphics Vertex and Pixel Shading in 32 nm CMOS

This paper describes a single-cycle throughput lighting accelerator fabricated in 1.05 V, 32 nm CMOS for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile systems-on-chip. Log-domain parallel computation of ambient, diffuse, and specular lighting using high-accuracy 32b log and anti-log units that convert computation from floating-point (FP) to fixed-point domain, 32b sparse-tree fixed-point adders and a 32 × 32b signed fixed-point multiplier with truncated partial product reduction tree enable 2.05 GVertices/s throughput measured at 1.05 V, 25°C in an area of 0.064 mm2 while achieving: (i) 47% reduction in critical path logic stages compared to previously published work, (ii) 0.56% mean vertex lighting error compared to single-precision FP computation, (iii) 354 μW active leakage power measured at 1.05 V, 25 °C, (iv) scalable performance up to 2.22 GHz, 232 mW measured at 1.2 V, (v) peak energy-efficiency of 56 GVertices/s/W, measured at 560 mV, 25 °C, and (vi) 119.6 dB PSNR for a 2 M pixel high-resolution 3D image.

[1]  Hoi-Jun Yoo,et al.  A 155-mW 50-m vertices/s graphics processor with fixed-point programmable vertex shader for mobile applications , 2006, IEEE Journal of Solid-State Circuits.

[2]  Chanho Lee,et al.  Design of a geometry engine for mobile 3D graphics , 2008, 2008 International SoC Design Conference.

[3]  Young-Su Kwon,et al.  A hardware accelerator for the specular intensity of phong illumination model in 3-dimensional graphics , 2000, ASP-DAC.

[4]  Arnaud Tisserand,et al.  Reciprocation, square root, inverse square root, and some elementary functions using small multipliers , 1998, Optics & Photonics.

[5]  B. Bloechel,et al.  A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS , 2004, IEEE Journal of Solid-State Circuits.

[6]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[7]  Hoi-Jun Yoo,et al.  A 52.4mW 3D Graphics Processor with 141Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[8]  Hoi-Jun Yoo,et al.  Development of a 3-D graphics rendering engine with lighting acceleration for handheld multimedia systems , 2005, IEEE Trans. Consumer Electron..

[9]  Tomás Lang,et al.  High-throughput CORDIC-based geometry operations for 3D computer graphics , 2005, IEEE Transactions on Computers.

[10]  S.-J. Choi,et al.  A 32nm SoC platform technology with 2nd generation high-k/metal gate transistors optimized for ultra low power, high performance, and high density product applications , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[11]  Edwin H. Blake,et al.  Faster Phong Shading via Angular Interpolation , 1989, Comput. Graph. Forum.

[12]  Khalid H. Abed,et al.  VLSI Implementation of a Low-Power Antilogarithmic Converter , 2003, IEEE Trans. Computers.

[13]  Hoi-Jun Yoo,et al.  A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems , 2006, IEEE Journal of Solid-State Circuits.

[14]  David M. Weimer,et al.  Fast Phong shading , 1986, SIGGRAPH.

[15]  M. Combet,et al.  Computation of the Base Two Logarithm of Binary Numbers , 1965, IEEE Trans. Electron. Comput..

[16]  Javier D. Bruguera,et al.  High-radix iterative algorithm for powering computation , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[17]  Hoi-Jun Yoo,et al.  A 231-MHz, 2.18-mW 32-bit Logarithmic Arithmetic Unit for Fixed-Point 3-D Graphics System , 2005, IEEE Journal of Solid-State Circuits.

[18]  Hoi-Jun Yoo,et al.  A 195 mW/152 mW Mobile Multimedia SoC With Fully Programmable 3-D Graphics and MPEG4/H.264/JPEG , 2008, IEEE Journal of Solid-State Circuits.

[19]  S. Hsu,et al.  A 110 GOPS/W 16-bit multiplier and reconfigurable PLA loop in 90-nm CMOS , 2005, IEEE Journal of Solid-State Circuits.

[20]  D. Harris,et al.  A powering unit for an OpenGL lighting engine , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[21]  John N. Mitchell,et al.  Computer Multiplication and Division Using Binary Logarithms , 1962, IRE Trans. Electron. Comput..

[22]  Pierre Poulin,et al.  A model for anisotropic reflection , 1990, SIGGRAPH.

[23]  Hoi-Jun Yoo,et al.  Power and Area-Efficient Unified Computation of Vector and Elementary Functions for Handheld 3D Graphics Systems , 2008, IEEE Transactions on Computers.

[24]  Javier D. Bruguera,et al.  Algorithm and architecture for logarithm, exponential, and powering computation , 2004, IEEE Transactions on Computers.

[25]  Hoi-Jun Yoo,et al.  An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Power 3-D Graphics SoC , 2009, IEEE Journal of Solid-State Circuits.

[26]  Hoi-Jun Yoo,et al.  A 195 mW, 9.1 MVertices/s Fully Programmable 3-D Graphics Processor for Low-Power Mobile Devices , 2008, IEEE Journal of Solid-State Circuits.