MergeTree: A Fast Hardware HLBVH Constructor for Animated Ray Tracing

Ray tracing is a computationally intensive rendering technique traditionally used in offline high-quality rendering. Powerful hardware accelerators have been recently developed that put real-time ray tracing even in the reach of mobile devices. However, rendering animated scenes remains difficult, as updating the acceleration trees for each frame is a memory-intensive process. This article proposes MergeTree, the first hardware architecture for Hierarchical Linear Bounding Volume Hierarchy (HLBVH) construction, designed to minimize memory traffic. For evaluation, the hardware constructor is synthesized on a 28nm process technology. Compared to a state-of-the-art binned surface area heuristic sweep (SAH) builder, the present work speeds up construction by a factor of 5, reduces build energy by a factor of 3.2, and memory traffic by a factor of 3. A software HLBVH builder on a graphics processing unit (GPU) requires 3.3 times more memory traffic. To take tree quality into account, a rendering accelerator is modeled alongside the builder. Given the use of a toplevel build to improve tree quality, the proposed builder reduces system energy per frame by an average 41% with primary rays and 13% with diffuse rays. In large ( > 500K triangles) scenes, the difference is more pronounced, 62% and 35%, respectively.

[1]  John Salmon,et al.  Automatic Creation of Object Hierarchies for Ray Tracing , 1987, IEEE Computer Graphics and Applications.

[2]  Dinesh Manocha,et al.  Fast BVH Construction on GPUs , 2009, Comput. Graph. Forum.

[3]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[4]  Tack-Don Han,et al.  SGRT: a mobile GPU architecture for real-time ray tracing , 2013, HPG '13.

[5]  Tero Karras,et al.  Maximizing parallelism in the construction of BVHs, octrees, and k-d trees , 2012, EGGH-HPG'12.

[6]  Edward F. Grove,et al.  Simple randomized mergesort on parallel disks , 1996, SPAA '96.

[7]  Mark Horowitz,et al.  Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.

[8]  Jarmo Takala,et al.  MergeTree: a HLBVH constructor for mobile systems , 2015, SIGGRAPH Asia Technical Briefs.

[9]  Kellogg S. Booth,et al.  Heuristics for ray tracing using space subdivision , 1990, The Visual Computer.

[10]  Daniel Kopta,et al.  Memory Considerations for Low Energy Ray Tracing , 2015, Comput. Graph. Forum.

[11]  Tero Karras,et al.  Architecture considerations for tracing incoherent rays , 2010, HPG '10.

[12]  Tack-Don Han,et al.  T&I engine: traversal and intersection engine for hardware accelerated ray tracing , 2011, SA '11.

[13]  Ben H. H. Juurlink,et al.  How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[14]  Kirill Garanzha,et al.  Simpler and faster HLBVH with work queues , 2011, HPG '11.

[15]  Michael Manzke,et al.  A hardware unit for fast SAH-optimised BVH construction , 2013, ACM Trans. Graph..

[16]  László Szirmay-Kalos,et al.  Stackless Multi‐BVH Traversal for CPU, MIC and GPU Ray Tracing , 2014, Comput. Graph. Forum.

[17]  Sean Keely,et al.  Reduced Precision for Hardware Ray Tracing in GPUs , 2014, High Performance Graphics.

[18]  Bill Lin,et al.  Fast and scalable priority queue architecture for high-speed network switches , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[19]  Timo Aila,et al.  Megakernels considered harmful: wavefront path tracing on GPUs , 2013, HPG '13.

[20]  Dinesh Manocha,et al.  HART: A Hybrid Architecture for Ray Tracing Animated Scenes , 2015, IEEE Transactions on Visualization and Computer Graphics.

[21]  Manolis Katevenis,et al.  Pipelined Heap (Priority Queue) Management for Advanced Scheduling in High-Speed Networks , 2007, IEEE/ACM Transactions on Networking.

[22]  Wei Liu,et al.  Power Efficient Division and Square Root Unit , 2012, IEEE Transactions on Computers.

[23]  Anuj Pathania,et al.  Power-performance modelling of mobile gaming workloads on heterogeneous MPSoCs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  Philipp Slusallek,et al.  RPU: a programmable ray processing unit for realtime ray tracing , 2005, ACM Trans. Graph..

[25]  Kunle Olukotun,et al.  Hardware acceleration of database operations , 2014, FPGA.

[26]  Daniel Kopta,et al.  A Mobile Accelerator Architecture for Ray Tracing , 2012 .

[27]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[28]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[29]  Youngsam Shin,et al.  Two-AABB traversal for mobile real-time ray tracing , 2014, SIGGRAPH ASIA Mobile Graphics and Interactive Applications.

[30]  Ingo Wald,et al.  Ray tracing deformable scenes using dynamic bounding volume hierarchies , 2007, TOGS.

[31]  Michael C. Doggett,et al.  SAH guided spatial split partitioning for fast BVH construction , 2016, Comput. Graph. Forum.

[32]  Tomas Akenine-Möller,et al.  Graphics Processing Units for Handhelds , 2008, Proc. IEEE.

[33]  Naehyuck Chang,et al.  A compressed frame buffer to reduce display power consumption in mobile systems , 2004, ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753).

[34]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[35]  Timo Aila,et al.  Fast parallel construction of high-quality bounding volume hierarchies , 2013, HPG '13.

[36]  Maxim Shevtsov,et al.  Ray-Triangle Intersection Algorithm for Modern CPU Architectures , 2007 .

[37]  Dinesh Manocha,et al.  RayCore , 2014, ACM Trans. Graph..

[38]  Ciprian Apetrei,et al.  Fast and Simple Agglomerative LBVH Construction , 2014, TPCG.

[39]  Timo Aila,et al.  Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.

[40]  Daniel Kopta,et al.  TRaX: A Multicore Hardware Architecture for Real-Time Ray Tracing , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[41]  Hélio Pedrini,et al.  Bounding volume hierarchy optimization through agglomerative treelet restructuring , 2015, HPG '15.

[42]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[43]  Kirill Garanzha,et al.  Grid-based SAH BVH construction on a GPU , 2011, The Visual Computer.

[44]  Jarmo Takala,et al.  Multi bounding volume hierarchies for ray tracing pipelines , 2016, SIGGRAPH Asia Technical Briefs.

[45]  Jacopo Pantaleoni,et al.  HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry , 2010, HPG '10.

[46]  Yangdong Deng,et al.  FastTree: A hardware KD-tree construction acceleration engine for real-time ray tracing , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[47]  I. Wald,et al.  On fast Construction of SAH-based Bounding Volume Hierarchies , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.

[48]  Michael Manzke,et al.  Evaluation of a BVH Construction Accelerator Architecture for High-Quality Visualization , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[49]  Ohad Ben-Shahar,et al.  SceneNet: A Perceptual Ontology Database for Scene Understanding , 2013 .