Architecture considerations for tracing incoherent rays

This paper proposes a massively parallel hardware architecture for efficient tracing of incoherent rays, e.g. for global illumination. The general approach is centered around hierarchical treelet subdivision of the acceleration structure and repeated queueing/postponing of rays to reduce cache pressure. We describe a heuristic algorithm for determining the treelet subdivision, and show that our architecture can reduce the total memory bandwidth requirements by up to 90% in difficult scenes. Furthermore the architecture allows submitting rays in an arbitrary order with practically no performance penalty. We also conclude that scheduling algorithms can have an important effect on results, and that using fixed-size queues is not an appealing design choice. Increased auxiliary traffic, including traversal stacks, is identified as the foremost remaining challenge of this architecture.

[1]  J. Halton On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals , 1960 .

[2]  Pat Hanrahan,et al.  Rendering complex scenes with memory-coherent ray tracing , 1997, SIGGRAPH.

[3]  Brian E. Smits Efficiency Issues for Ray Tracing , 1998, J. Graphics, GPU, & Game Tools.

[4]  Philipp Slusallek,et al.  SaarCOR: a hardware architecture for ray tracing , 2002, HWWS '02.

[5]  Philipp Slusallek,et al.  Realtime ray tracing of dynamic scenes on an FPGA chip , 2004, Graphics Hardware.

[6]  P. Slusallek,et al.  RPU: a programmable ray processing unit for realtime ray tracing , 2005, SIGGRAPH '05.

[7]  Brian E. Smits Efficiency issues for ray tracing , 1998, J. Graphics, GPU, & Game Tools.

[8]  Philipp Slusallek,et al.  RPU: a programmable ray processing unit for realtime ray tracing , 2005, ACM Trans. Graph..

[9]  Alexander Keller,et al.  Instant ray tracing: the bounding interval hierarchy , 2006, EGSR '06.

[10]  P.A. Navratil,et al.  Dynamic Ray Scheduling to Improve Ray Coherence and Bandwidth Utilization , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.

[11]  Ronny Krashinsky Vector-thread architecture and implementation , 2007 .

[12]  Ralf Karrenberg Memory Aware Realtime Ray Tracing : The Bounding Plane Hierarchy , 2007 .

[13]  Dinesh Manocha,et al.  Ray-Strips: A Compact Mesh Representation for Interactive Ray Tracing , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.

[14]  Pat Hanrahan,et al.  Interactive k-d tree GPU raytracing , 2007, SI3D.

[15]  S. Boulos,et al.  Getting rid of packets - Efficient SIMD single-ray traversal using multi-branching BVHs - , 2008, 2008 IEEE Symposium on Interactive Ray Tracing.

[16]  Alexander Keller,et al.  Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays , 2008, Comput. Graph. Forum.

[17]  Timo Aila,et al.  Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.

[18]  P. Hanrahan,et al.  GRAMPS: A programming model for graphics pipelines , 2009, ACM Trans. Graph..

[19]  Karthik Ramani,et al.  StreamRay: a stream filtering architecture for coherent ray tracing , 2009, ASPLOS.

[20]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[21]  Bochang Moon,et al.  RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies , 2010, IEEE Trans. Vis. Comput. Graph..

[22]  Sung-eui Yoon,et al.  RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies , 2009, IEEE Transactions on Visualization and Computer Graphics.