Compilation, architectural support, and evaluation of SIMD graphics pipeline programs on a general-purpose CPU

Graphics and media processing is quickly emerging to become one of the key computing workloads. Programmable graphics processors give designers extra flexibility by running a small program for each fragment in the graphics pipeline. We investigate low-cost mechanisms to obtain good performance for modern graphics programs on a general purpose CPU. We present a compiler that compiles SIMD graphics program and generates efficient code on a general purpose CPU. The generated code can process between 25-0.3 million vertices per second on a 2.2 GHz Intel Pentium/spl reg/ 4 processor for a group of typical graphics programs. We also evaluate the impact of three changes in the architecture and compiler. Adding support for new specialized instructions improves the performance of the programs by 27.4% on average. A novel compiler optimization called mask analysis improves the performance of the programs by 19.5% on average. Increasing the number of architectural SIMD registers from 8 to 16 registers significantly reduces the number of memory accesses due to register spills.

[1]  Gregory J. Chaitin,et al.  Register allocation and spilling via graph coloring , 2004, SIGP.

[2]  James H. Clark,et al.  The geometry engine: a VLSI geometry system for graphics , 1998 .

[3]  Kellogg S. Booth,et al.  Report from the chair , 1986 .

[4]  Erik Lindholm,et al.  A user-programmable vertex engine , 2001, SIGGRAPH.

[5]  Shreekant S. Thakkar,et al.  Internet Streaming SIMD Extensions , 1999, Computer.

[6]  Pradeep K. Dubey,et al.  How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[7]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[8]  Markus Wagner,et al.  Interactive Rendering with Coherent Ray Tracing , 2001, Comput. Graph. Forum.

[9]  Pat Hanrahan,et al.  A language for shading and lighting calculations , 1990, SIGGRAPH.

[10]  Bruce K. Holmer Automatic Design of Computer Instruction Sets , 1993 .

[11]  Sharad Malik,et al.  Datapath merging and interconnection sharing for reconfigurable architectures , 2002, 15th International Symposium on System Synthesis, 2002..

[12]  William R. Mark,et al.  Compiling to a VLIW fragment pipeline , 2001, HWWS '01.

[13]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[14]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[15]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .