Latency Considerations of Depth-first GPU Ray Tracing

Despite the potential divergence of depth-first ray tracing [AL09], it is nevertheless the most efficient approach on massively parallel graphics processors. Due to the use of specialized caching strategies that were originally developed for texture access, it has been shown to be compute rather than bandwidth limited. Especially with recents developments however, not only the raw bandwidth, but also the latency for both memory access and read after write register dependencies can become a limiting factor. In this paper we will analyze the memory and instruction dependency latencies of depth first ray tracing. We will show that ray tracing is in fact latency limited on current GPUs and propose three simple strategies to better hide the latencies. This way, we come significantly closer to the maximum performance of the GPU.