Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions

Three-dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geometry processing of 3D graphics on the host CPU and have specialized hardware handle the rendering task. In this paper, we analyze microarchitecture and SIMD instruction set enhancements to a RISC superscalar processor for exploiting parallelism in geometry processing for 3D computer graphics. Our results show that 3D geometry processing has inherent parallelism. Adding SIMD operations improves performance from 8 percent to 28 percent on a 4-issue dynamically scheduled processor that can issue at most two floating-point operations. In comparison, an 8-issue processor, ignoring cycle time effects, can achieve 20 to 60 percent performance improvement over a 4-issue. If processor cycle time scales with the number of ports to the register file, then doubling only the floating-point issue width of a 4-issue processor with SIMD instructions gives the best performance among the architectural configurations that we examine (the most aggressive configuration is an 8-issue processor with SIMD instructions).

[1]  Marc Tremblay,et al.  The visual instruction set (VIS) in UltraSPARC , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[2]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[3]  Norman P. Jouppi,et al.  Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.

[4]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[5]  Norman P. Jouppi,et al.  Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[6]  Gerry Kane,et al.  PA-RISC 2.0 Architecture , 1995 .

[7]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[9]  Michael D. Smith,et al.  Geust Editorial: Media processing: a new design target , 1996, IEEE Micro.

[10]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .

[11]  Robert Sims,et al.  Alpha architecture reference manual , 1992 .

[12]  John S. Montrym,et al.  InfiniteReality: a real-time graphics system , 1997, SIGGRAPH.

[13]  P. Chow,et al.  Memory-system Design Considerations For Dynamically-scheduled Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Thomas Ertl,et al.  Computer Graphics - Principles and Practice, 3rd Edition , 2014 .

[15]  M. Carter Computer graphics: Principles and practice , 1997 .

[16]  大野 義夫,et al.  Computer Graphics : Principles and Practice, 2nd edition, J.D. Foley, A.van Dam, S.K. Feiner, J.F. Hughes, Addison-Wesley, 1990 , 1991 .

[17]  John G. Eyles,et al.  PixelFlow: high-speed rendering using image composition , 1992, SIGGRAPH.

[18]  Chia-Lin Yang,et al.  Exploiting instruction level parallelism in geometry processing for three dimensional graphics applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.