Distributed-Memory 3D Rendering with Object Migration

Object dataflow is a popular approach used in parallel rendering. The data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous methods were implemented on shared memory architectures and exploited only object-space coherency to reduce cache misses. In this paper, we propose an efficient model for object dataflow on distributed memory machines. The ‘ Active Ray Tracing’ algorithm is introduced and its ray-stacking mechanism is used to support latency hiding by postponing computation on inactive rays. We also optimize memory usage by letting objects migrate and replicate at different processors rather than the common static assignments. Our cache-only-memory approach uses a distributed-directory scheme to trace the location of objects at other nodes. A mechanism to minimize network congestion was implemented which optimizes channel utilization. Unlike previous methods, our approach can benefit from temporal coherence and effectively eliminates communication costs in successive frames. A volume ray casting instance of the algorithm, which was implementation Cray T3D, demonstrate linear speedup.

[1]  Tadao Nakamura,et al.  Load balancing strategies for a parallel ray-tracing system based on constant subdivision , 2005, The Visual Computer.

[2]  Brian Wyvill,et al.  Multiprocessor Ray Tracing , 1986, Comput. Graph. Forum.

[3]  Thierry Priol,et al.  Ray tracing on distributed memory parallel computers : strategies for distributing computations and data , 1990 .

[4]  Marc Levoy,et al.  Volume rendering on scalable shared-memory MIMD architectures , 1992, VVS.

[5]  H. Kubota,et al.  Effective Parallel Processing for Synthesizing Continuous Images , 1989 .

[6]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[7]  T. Joe,et al.  Evaluating the memory overhead required for COMA architectures , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[8]  B. Corrie,et al.  Parallel volume rendering and data coherence , 1993, Proceedings of 1993 IEEE Parallel Rendering Symposium.

[9]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[10]  Roni Yagel,et al.  CellFlow: A Parallel Rendering Scheme for Distributed Memory Architectures , 1995, PDPTA.

[11]  Anoop Gupta,et al.  Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[12]  Derek J. Paddon,et al.  Exploiting coherence for multiprocessor ray tracing , 1989, IEEE Computer Graphics and Applications.

[13]  Norma Banas,et al.  Visualization , 1968, Machine-mediated learning.

[14]  Erik Hagersten,et al.  The Cache Coherence Protocol of the Data Diffusion Machine , 1989, PARLE.

[15]  Mark A. Z. Dippé,et al.  An adaptive subdivision algorithm and parallel architecture for realistic image synthesis , 1984, SIGGRAPH.