A Multi-Accelerator Architecture for Photon Mapping

Real-time rendering of photorealistic images has always been an important goal in Computer Graphics. The most computationally expensive part of this process is obtaining the effects of global illumination. Photon mapping is a well-known technique for calculation of realistic global illumination, and also shows characteristics which we believe make it favorable for dedicated hardware acceleration.Online arithmetic is a digit-serial form of arithmetic, where input vectors are processed from the most significant digit down to the least, and the result is also produced one digit at each step. Pipelined online arithmetic circuits are extremely regular while only requiring simple calculations between registers, which allows for high clock speeds and low power dissipation with a huge potential for parallel execution.Combining these two concepts, we design and evaluate MAPM (Multi-Accelerator for Photon Mapping), a multi-accelerator architecture that employs pipelined online arithmetic to accelerate the two most time consuming operations in photon mapping: the tree search and shader operation. On a VHDL implementation, we perform behavioral verification using ModelSim, examine hardware cost with Synopsys tools and evaluate throughput gain and scalability of the architecture using a custom built cycle-accurate simulator based on the Intel Pin tool.By employing two MAPMs set to a configuration of 16 Tree Search Modules, 16 Shader Operation Modules and 2 Shader Operation Accelerators per Shader Operation Module, we observed a throughput increase of 1384x over an optimized software setup, and an increase of 4.78x over a recent MPSoC implementation. This is achieved using an acceptable hardware cost of 28.8% of the bandwidth, 22.2% of the area, and 5.6% of the power consumption of the low-end Intel Celeron G1820T.The MAPM also shows a significant reduction in power dissipation. Compared to a conventional parallel circuit with equivalent functionality, the MAPM showed a synthesizable clock speed at about 3.5x, dynamic power consumption of 0.104x, and area cost of 1.799x.

[1]  Sun UltraSPARC,et al.  A closer look at GPUs , 2008, Commun. ACM.

[2]  Anselmo Lastra,et al.  Reordering for cache conscious photon mapping , 2005, Graphics Interface.

[3]  S. Singh,et al.  SIMD Packet Techniques for Photon Mapping , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.

[4]  Xin Sun,et al.  T-ReX: Interactive Global Illumination of Massive Models on Heterogeneous Computing Resources , 2014, IEEE Transactions on Visualization and Computer Graphics.

[5]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[6]  Philippe Bekaert,et al.  Advanced global illumination , 2006 .

[7]  Milos D. Ercegovac,et al.  Error Analysis of Certain Floating-Point On-Line Algorithms , 1983, IEEE Transactions on Computers.

[8]  S. Popov,et al.  Experiences with Streaming Construction of SAH KD-Trees , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[9]  Donald P. Greenberg,et al.  Modeling the interaction of light between diffuse surfaces , 1984, SIGGRAPH.

[10]  Milos D. Ercegovac,et al.  Design And Vlsi Implementation Of An On-Line Algorithm , 1986, Optics & Photonics.

[11]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[12]  Pat Hanrahan,et al.  Photon mapping on programmable graphics hardware , 2003, HWWS '03.

[13]  Vijay Janapa Reddi,et al.  PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.

[14]  David K. McAllister,et al.  OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..

[15]  Morgan McGuire,et al.  Hardware-accelerated global illumination by image space photon mapping , 2009, High Performance Graphics.

[16]  Kishor S. Trivedi,et al.  On-line algorithms for division and multiplication , 1975, 1975 IEEE 3rd Symposium on Computer Arithmetic (ARITH).

[17]  Tomás Lang,et al.  On-line scheme for computing rotation factors , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[18]  F. D. Dinechin,et al.  Custom Arithmetic Datapath Design for FPGAs using the FloPoCo Core Generator , 2011 .

[19]  P. Slusallek,et al.  RPU: a programmable ray processing unit for realtime ray tracing , 2005, SIGGRAPH '05.

[20]  H. Jensen,et al.  Progressive photon mapping , 2008, SIGGRAPH 2008.

[21]  Ingo Wald,et al.  State of the Art in Ray Tracing Animated Scenes , 2009, Comput. Graph. Forum.

[22]  Anselmo Lastra,et al.  Practical photon mapping in hardware , 2007 .

[23]  Mohamed Rafiquzzaman,et al.  Introduction to Digital Systems , 2005 .

[24]  Milos D. Ercegovac,et al.  Floating-point on-line arithmetic: Algorithms , 1981, 1981 IEEE 5th Symposium on Computer Arithmetic (ARITH).

[25]  Eduardo Sanchez,et al.  A FPGA-based hardware implementation of generalized profile search using online arithmetic , 1999, FPGA '99.

[26]  H. Jensen Realistic Image Synthesis Using Photon Mapping , 2001 .

[27]  Vlastimil Havran,et al.  Heuristic ray shooting algorithms , 2000 .

[28]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[29]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[30]  W.R. Mark,et al.  Fast kd-tree Construction with an Adaptive Error-Bounded Heuristic , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[31]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[32]  Kellogg S. Booth,et al.  Heuristics for ray tracing using space subdivision , 1990, The Visual Computer.

[33]  Milos D. Ercegovac,et al.  Accelerating the photon mapping algorithm and its hardware implementation , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[34]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[35]  Milos D. Ercegovac,et al.  Application of on-line arithmetic algorithms to the SVD computation: preliminary results , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[36]  Juhyun Lee,et al.  The irregular Z-buffer: Hardware acceleration for irregular data structures , 2005, TOGS.

[37]  Milos D. Ercegovac,et al.  FPGA Implementation of Pipelined On-Line Scheme for 3-D Vector Normalization , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[38]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[39]  Milos D. Ercegovac,et al.  On-line multiplicative normalization , 1983, 1983 IEEE 6th Symposium on Computer Arithmetic (ARITH).

[40]  Guido van Rossum,et al.  Python Programming Language , 2007, USENIX Annual Technical Conference.

[41]  Gordon Stoll,et al.  Razor: An architecture for dynamic multiresolution ray tracing , 2011, TOGS.

[42]  Bent Dalgaard Larsen,et al.  Simulating Photon Mapping for Real-time Applications , 2004, Rendering Techniques.

[43]  C.-W. Jen,et al.  Motion estimation using MSD-first processing , 2003 .

[44]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[45]  Kishor S. Trivedi,et al.  Higher radix on-line division , 1978, 1978 IEEE 4th Symposium onomputer Arithmetic (ARITH).

[46]  Milos D. Ercegovac,et al.  Floating-point on-line arithmetic: Error analysis , 1981, 1981 IEEE 5th Symposium on Computer Arithmetic (ARITH).

[47]  Hans-Peter Seidel,et al.  Fast Final Gathering via Reverse Photon Mapping , 2005, Comput. Graph. Forum.

[48]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[49]  Hans-Peter Seidel,et al.  Ray Tracing Animated Scenes using Motion Decomposition , 2006, Comput. Graph. Forum.

[50]  Maxim Shevtsov,et al.  Highly Parallel Fast KD‐tree Construction for Interactive Ray Tracing of Dynamic Scenes , 2007, Comput. Graph. Forum.

[51]  Milos D. Ercegovac A General Hardware-Oriented Method for Evaluation of Functions and Computations in a Digital Computer , 1977, IEEE Transactions on Computers.

[52]  Glenn Reinman,et al.  Accelerating photon mapping , 2011 .

[53]  Milos D. Ercegovac,et al.  FPGA-based structures for on-line FFT and DCT , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[54]  Milos D. Ercegovac,et al.  An On-Line Square Root Algorithm , 1982, IEEE Transactions on Computers.

[55]  Ming-Bo Lin,et al.  Parallel photon-mapping rendering on a mesh-NoC-based MPSoC platform , 2014, J. Parallel Distributed Comput..