A Tile-Based EGPU with a Fused Universal Processing Engine and Graphics Coprocessor Cluster

As various applied sensors have been integrated into embedded devices, the Embedded Graphics Processing Unit (EGPU) has assumed more processing tasks, which requires an EGPU with higher performance. A tile-based EGPU is proposed that can be used in both general-purpose computing and 3D graphics rendering. With fused, scalable, and hierarchical parallelism architecture, the EGPU has the ability to address nearly 100 million vertices or fragments and achieves 1 GFLOPS per second at a clock frequency of 200 MHz. A fused and scalable architecture, constituted by Universal Processing Engine (UPE) and Graphics Coprocessor Cluster (GCC), ensures that the EGPU can adapt to various graphic processing scenes and situations, achieving more efficient rendering. Moreover, hierarchical parallelism is implemented via the UPE. Additionally, tiling brings a significant reduction in both system memory bandwidth and power consumption. A 0.18 µm technology library is used for timing and power analysis. The area of the proposed EGPU is 6.5 mm 6.5 mm, and its power consumption is approximately 349.318 mW. Experimental results demonstrate that the proposed EGPU can be used in a System on Chip (SoC) configuration connected to sensors to accelerate its processing and create a proper balance between performance and cost.

[1]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[2]  Emmett Kilgariff,et al.  Fermi GF100 GPU Architecture , 2011, IEEE Micro.

[3]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[4]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[5]  Matt Pharr,et al.  Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation , 2005 .

[6]  Young-Jun Kim,et al.  A Reconfigurable SIMT Processor for Mobile Ray Tracing With Contention Reduction in Shared Memory , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Hoi-Jun Yoo,et al.  A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems , 2007, IEEE J. Solid State Circuits.

[8]  Hoi-Jun Yoo,et al.  A 155-mW 50-m vertices/s graphics processor with fixed-point programmable vertex shader for mobile applications , 2006, IEEE Journal of Solid-State Circuits.

[9]  Lee-Sup Kim,et al.  A 186-Mvertices/s 161-mW Floating-Point Vertex Processor With Optimized Datapath and Vertex Caches , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Lee Garber GPUs Go Mobile , 2013, Computer.

[11]  Young-Jun Kim,et al.  Homogeneous Stream Processors With Embedded Special Function Units for High-Utilization Programmable Shaders , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Hoi-Jun Yoo,et al.  A 195 mW/152 mW Mobile Multimedia SoC With Fully Programmable 3-D Graphics and MPEG4/H.264/JPEG , 2008, IEEE Journal of Solid-State Circuits.