A Dual-Shader 3-D Graphics Processor With Fast 4-D Vector Inner Product Units and Power-Aware Texture Cache

This paper presents a fully programmable 3-D graphics processor using unified shaders for mobile environment. In the system level, we adopted dual-core, dual-issue VLIW, and multithreading methods to utilize instruction, data, and task level parallelism in the graphics applications. In the shader core level, a novel IEEE-754 compliant 4-D vector inner product arithmetic unit and a configurable texture cache are proposed. Using these methods, the proposed processor achieves 143 Mvertices/s and 2.3 Gtexels/s consuming the power of 367 mW. The evaluation shows significant performance and power-delay product benefits. For real graphics applications, test results indicate 2.07 times improvement in performance and 34% reduction in power-delay product compared to previous mobile 3-D graphics processors. The proposed 3-D graphics processor is implemented in 4.5× 4.52 mm using 0.18 μm CMOS technology.

[1]  Lee-Sup Kim,et al.  A 186Mvertices/s 161mW Floating-Point Vertex Processor for Mobile Graphics Systems , 2007, 2007 IEEE Custom Integrated Circuits Conference.

[2]  松田 晃一,et al.  OpenGL ES 2.0プログラミングガイド , 2009 .

[3]  Lee-Sup Kim,et al.  An SoC with 1.3 gtexels/s 3-D graphics full pipeline for consumer applications , 2006, IEEE Journal of Solid-State Circuits.

[4]  Young-Jun Kim,et al.  An SoC with 1.3 Gtexels/s 3D graphics full pipeline engine for consumer applications , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[5]  Hoi-Jun Yoo,et al.  A 195 mW/152 mW Mobile Multimedia SoC With Fully Programmable 3-D Graphics and MPEG4/H.264/JPEG , 2008, IEEE Journal of Solid-State Circuits.

[6]  Yu-Cheng Lin,et al.  An 8.6 mW 25 Mvertices/s 400-MFLOPS 800-MOPS 8.91 mm$^{2}$ Multimedia Stream Processor Core for Mobile Applications , 2008, IEEE Journal of Solid-State Circuits.

[7]  HyunWook Park,et al.  A 36 fps SXGA 3-D Display Processor Embedding a Programmable 3-D Graphics Rendering Engine , 2008, IEEE Journal of Solid-State Circuits.

[8]  Lee-Sup Kim,et al.  An Energy-Efficient Mobile Vertex Processor With Multithread Expanded VLIW Architecture and Vertex Caches , 2007, IEEE Journal of Solid-State Circuits.

[9]  Saven,et al.  Shader Model 3.0深度分析 , 2005 .

[10]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[11]  Hoi-Jun Yoo,et al.  A 52.4mW 3D Graphics Processor with 141Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[12]  H. Yoo,et al.  A 50 Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[13]  Nikil D. Dutt,et al.  Fast Configurable-Cache Tuning With a Unified Second-Level Cache , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Carlos González,et al.  Shader performance analysis on a modern GPU architecture , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[15]  Anoop Gupta,et al.  The Design and Analysis of a Cache Architecture for Texture Mapping , 1997, ISCA.

[16]  Lee-Sup Kim,et al.  A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency , 2009, IEEE Transactions on Computers.

[17]  Lee-Sup Kim,et al.  A 3D graphics processor with fast 4D vector inner product units and power aware texture cache , 2008, 2008 IEEE Custom Integrated Circuits Conference.

[18]  Fumio Arakawa,et al.  An embedded processor core for consumer appliances with 2.8GFLOPS and 36M polygons/s FPU , 2004 .