Toward a multicore architecture for real-time ray-tracing

Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interactions. The conventional Z-buffer algorithm driven GPU model does not provide sufficient support for this improvement. This paper targets the entire graphics system stack and demonstrates algorithms, a software architecture, and a hardware architecture for real-time rendering with a paradigm shift to ray-tracing. The three unique features of our system called Copernicus are support for dynamic scenes, high image quality, and execution on programmable multicore architectures. The focus of this paper is the synergy and interaction between applications, architecture, and evaluation. First, we describe the ray-tracing algorithms which are designed to use redundancy and partitioning to achieve locality. Second, we describe the architecture which uses ISA specialization, multi-threading to hide memory delays and supports only local coherence. Finally, we develop an analytical performance model for our 128-core system, using measurements from simulation and a scaled-down prototype system. More generally, this paper addresses an important issue of mechanisms and evaluation for challenging workloads for future processors. Our results show that a single 8-core tile (each core 4-way multithreaded) can be almost 100% utilized and sustain 10 million rays/second. Sixteen such tiles, which can fit on a 240 mm2 chip in 22 nm technology, make up the system and with our anticipated improvements in algorithms, can sustain real-time rendering. The mechanisms and the architecture can potentially support other domains like irregular scientific computations and physics computations.

[1]  M. McKusick,et al.  gprof: a call graph execution profiler , 2004, SIGP.

[2]  Edward D. Lazowska,et al.  Quantitative System Performance , 1985, Int. CMG Conference.

[3]  John Paul Shen,et al.  Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Mary K. Vernon,et al.  Analytic evaluation of shared-memory systems with ILP processors , 1998, ISCA.

[5]  Brian E. Smits Efficiency Issues for Ray Tracing , 1998, J. Graphics, GPU, & Game Tools.

[6]  Erik Reinhard,et al.  Dynamic Acceleration Structures for Interactive Ray Tracing , 2000, Rendering Techniques.

[7]  T. Puzak,et al.  The optimum pipeline depth for a microprocessor , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[8]  Pradip Bose,et al.  Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[9]  Michael Gschwind,et al.  New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors , 2003, IBM J. Res. Dev..

[10]  Patricia J. Teller,et al.  PAPI deployment, evaluation, and extensions , 2003, 2003 User Group Conference. Proceedings.

[11]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[12]  Philipp Slusallek,et al.  Realtime Ray Tracing and its use for Interactive Global Illumination , 2003, Eurographics.

[13]  Philipp Slusallek,et al.  Distributed interactive ray tracing of dynamic scenes , 2003, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003. PVG 2003..

[14]  Theodore Antonakopoulos,et al.  An Instruction Throughput Model of Superscalar Processors , 2003 .

[15]  Karthikeyan Sankaralingam,et al.  Universal Mechanisms for Data-Parallel Architectures , 2003, MICRO.

[16]  Michael Gschwind,et al.  Integrated analysis of power and performance for pipelined microprocessors , 2004, IEEE Transactions on Computers.

[17]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[19]  Peter-Pike J. Sloan,et al.  Interactive ray tracing , 2005, SIGGRAPH Courses.

[20]  John D. McCalpin,et al.  Characterization of simultaneous multithreading (SMT) efficiency in POWER5 , 2005, IBM J. Res. Dev..

[21]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[22]  Philipp Slusallek,et al.  RPU: a programmable ray processing unit for realtime ray tracing , 2005, ACM Trans. Graph..

[23]  David I. August,et al.  Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[24]  S.G. Parker,et al.  Design for Parallel Interactive Ray Tracing Systems , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[25]  I. Wald,et al.  Ray Tracing on the Cell Processor , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[26]  Nicholas Nethercote,et al.  "Building Workload Characterization Tools with Valgrind" , 2006, 2006 IEEE International Symposium on Workload Characterization.

[27]  Philipp Slusallek,et al.  B-KD trees for hardware accelerated ray tracing of dynamic scenes , 2006, GH '06.

[28]  E. Brunvand,et al.  Estimating Performance of a Ray-Tracing ASIC Design , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[29]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[30]  Vijayalakshmi Srinivasan,et al.  Pipeline spectroscopy , 2007, SIGMETRICS.

[31]  Glenn Reinman,et al.  ParallAX: an architecture for real-time physics , 2007, ISCA '07.

[32]  Scott A. Mahlke,et al.  Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[33]  Christopher J. Hughes,et al.  Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.

[34]  Ying Zhang,et al.  A 64-bit stream processor architecture for scientific applications , 2007, ISCA '07.

[35]  James E. Smith,et al.  Automated design of application specific superscalar processors: an analytical approach , 2007, ISCA '07.

[36]  Pat Hanrahan,et al.  Interactive k-d tree GPU raytracing , 2007, SI3D.

[37]  Eftychios Sifakis,et al.  Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors , 2007, ISCA '07.

[38]  Sally A. McKee,et al.  Efficient architectural design space exploration via predictive modeling , 2008, TACO.

[39]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[40]  Daniel Kopta,et al.  TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing , 2008, 2008 Symposium on Application Specific Processors.

[41]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.