Cache-aware sampling strategies for texture-based ray casting on GPU

As a major component of volume rendering, the ray casting algorithm is memory-intensive. However, most existing texture-based volume rendering methods blindly map computational resources to texture memory and result in an incoherent access pattern, causing low cache hit rates in certain cases. The distance between samples taken by threads of the same scheduling unit (e.g. a warp of 32 threads in CUDA), of the GPU is a major factor that affects the texture cache hit rate. Based on this fact, we present a new sampling strategy, i.e. warp marching, which displays a novel computation-to-core mapping. In addition, a double buffer approach is introduced and special GPU operations are leveraged to improve the efficiency of parallel executions. To keep a roughly constant rendering performance when rotating the volume, we change our warp marching algorithm, so that samples can be taken along different directions of the volume. As a result, varying texture cache hit rates in different viewing directions are averaged out. Through a series of micro-benchmarking and real-life data experiments, we rigorously analyze our sampling strategies, and demonstrate significant performance enhancements over existing sampling methods.

[1]  M. Bauer,et al.  Interactive volume on standard PC graphics hardware using multi-textures and multi-stage rasterization , 2000, Workshop on Graphics Hardware.

[2]  Brian Cabral,et al.  Accelerated volume rendering and tomographic reconstruction using texture mapping hardware , 1994, VVS '94.

[3]  Jack J. Purdum,et al.  C programming guide , 1983 .

[4]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[5]  Rüdiger Westermann,et al.  Efficiently using graphics hardware in volume rendering applications , 1998, SIGGRAPH.

[6]  Nicholas Wilt,et al.  The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .

[7]  Kwan-Liu Ma,et al.  High-quality lighting and efficient pre-integration for volume rendering , 2004, VISSYM'04.

[8]  Charles D. Hansen,et al.  A data distributed, parallel algorithm for ray-traced volume rendering , 1993 .

[9]  Kwan-Liu Ma,et al.  In Situ Visualization at Extreme Scale: Challenges and Opportunities , 2009, IEEE Computer Graphics and Applications.

[10]  Thomas Ertl,et al.  A two-step approach for interactive pre-integrated volume rendering of unstructured grids , 2002, Symposium on Volume Visualization and Graphics, 2002. Proceedings. IEEE / ACM SIGGRAPH.

[11]  Thomas Ertl,et al.  Maintaining constant frame rates in 3D texture-based volume rendering , 2004, Proceedings Computer Graphics International, 2004..

[12]  Thomas Ertl,et al.  Smart Hardware-Accelerated Volume Rendering , 2003, VisSym.

[13]  Ulrich Neumann,et al.  Accelerating Volume Reconstruction With 3D Texture Hardware , 1994 .

[14]  Kwan-Liu Ma,et al.  Parallel volume rendering using binary-swap compositing , 1994, IEEE Computer Graphics and Applications.

[15]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[16]  Martin Kraus,et al.  High-quality pre-integrated volume rendering using hardware-accelerated pixel shading , 2001, HWWS '01.

[17]  Markus Hadwiger,et al.  Real-time volume graphics , 2006, Eurographics.

[18]  Fumihiko Ino,et al.  Improving Cache Locality for Ray Casting with CUDA , 2012, ARCS Workshops.