Active thread compaction for GPU path tracing

Modern GPUs like NVidia's Fermi internally operate in a SIMD manner by ganging multiple (32) scalar threads together into SIMD warps; if a warp's threads diverge, the warp serially executes both branches, temporarily disabling threads that are not on that path. In this paper, we explore and thoroughly analyze the concept of active thread compaction---i.e., the process of taking multiple partially-filled warps and compacting them to fewer but fully utilized warps---in the context of a CUDA path tracer. Our results show that this technique can indeed lead to significant improvements in SIMD utilization, and corresponding savings in the amount of work performed; however, they also show that certain inadequacies of today's hardware wipe out most of the achieved gains, leaving bottom-up speed-ups of a mere 12--16%. We believe our analysis of why this is the case will provide insight to other researchers experimenting with this technique in different contexts.

[1]  John A. Tsakok Faster incoherent rays: Multi-BVH ray stream tracing , 2009, High Performance Graphics.

[2]  C.P. Gribble,et al.  Coherent ray tracing via stream filtering , 2008, 2008 IEEE Symposium on Interactive Ray Tracing.

[3]  Philipp Slusallek,et al.  Two‐Level Grids for Ray Tracing on GPUs , 2011, Comput. Graph. Forum.

[4]  S. Boulos,et al.  Adaptive ray packet reordering , 2008, 2008 IEEE Symposium on Interactive Ray Tracing.

[5]  Tero Karras,et al.  Architecture considerations for tracing incoherent rays , 2010, HPG '10.

[6]  Vlastimil Havran,et al.  Path Regeneration for Interactive Path Tracing , 2010, Eurographics.

[7]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[8]  John C. Hart,et al.  Stream compaction for deferred shading , 2009, High Performance Graphics.

[9]  Dietger van Antwerpen,et al.  Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU , 2011, HPG '11.

[10]  Philipp Slusallek,et al.  AnySL: efficient and portable shading for ray tracing , 2010, HPG '10.

[11]  Timo Aila,et al.  Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.

[12]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .