Parallel volume rendering on a single-chip SIMD architecture

Volume rendering has great potential for parallelization due to the tremendous number of computations necessary. Besides the enormous computational power needed, the memory interface is usually of crucial importance and frequently the bottleneck. The paper presents an implementation of a parallel ray casting algorithm for orthogonal projections on a new single-chip SIMD architecture. Concurrent processing of rays is scheduled such that redundant memory accesses of the individual processing elements can be detected by the channel controller. Hence, data can be read efficiently in block-wise manner. For improved image quality, a permutation of the Shear-Warp algorithm with trilinear interpolation is used. The steps of the ray casting algorithm are carefully mapped onto the architecture avoiding expensive floating point operation, giving superior performance over previously reported results. A detailed analysis illustrates the timing of the individual computations and memory accesses, identifying the costliest parts of the implementation.

[1]  John G. Eyles,et al.  PixelFlow: high-speed rendering using image composition , 1992, SIGGRAPH.

[2]  Günter Knittel,et al.  A scalable architecture for volume rendering , 1994, Comput. Graph..

[3]  James F. Blinn Jim Blinn's Corner , 1987, IEEE Computer Graphics and Applications.

[4]  Arun K. Somani,et al.  Permutation warping for data parallel volume rendering , 1993 .

[5]  William M. Hsu Segmented ray casting for data parallel volume rendering , 1993 .

[6]  Jaap Smit,et al.  Design of an On-Chip Reflectance Map , 1995, Workshop on Graphics Hardware.

[7]  Tadao Nakamura,et al.  Parallel processing of the shear-warp factorization with the binary-swap method on a distributed-memory multiprocessor system , 1997, PRS '97.

[8]  William M. Hsu,et al.  Segmented ray casting for data parallel volume rendering , 1993, Proceedings of 1993 IEEE Parallel Rendering Symposium.

[9]  Arie E. Kaufman,et al.  PAVLOV: a programmable architecture for volume processing , 1998, Workshop on Graphics Hardware.

[10]  Craig M. Wittenbrink Extensions to Permutation Warping for Parallel Volume Rendering , 1998, Parallel Comput..

[11]  Philip K. Robertson,et al.  Volume rendering on the MasPar MP-1 , 1992, VVS.

[12]  B. Corrie,et al.  Parallel volume rendering and data coherence , 1993, Proceedings of 1993 IEEE Parallel Rendering Symposium.

[13]  Philippe Lacroute,et al.  Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization , 1995 .

[14]  M. Levoy,et al.  Fast volume rendering using a shear-warp factorization of the viewing transformation , 1994, SIGGRAPH.

[15]  Günter Knittel,et al.  The ULTRAVIS system , 2000, VVS.

[16]  W. Lefer An efficient parallel ray tracing scheme for distributed memory parallel computers , 1993, Proceedings of 1993 IEEE Parallel Rendering Symposium.

[17]  James F. Blinn Jim Blinn's corner: dirty pixels , 1998 .

[18]  Marc Levoy,et al.  Volume rendering on scalable shared-memory MIMD architectures , 1992, VVS.

[19]  M.E. Palmer,et al.  Exploiting deep parallel memory hierarchies for ray casting volume rendering , 1997, Proceedings IEEE Symposium on Parallel Rendering (PRS'97).

[20]  U. Neumann Parallel volume-rendering algorithm performance on mesh-connected multicomputers , 1993, Proceedings of 1993 IEEE Parallel Rendering Symposium.

[21]  Anselmo Lastra,et al.  PixelFlow: the realization , 1997, HWWS '97.

[22]  P. Peggy Li,et al.  ParVox: a parallel splatting volume rendering system for distributed visualization , 1997, PRS '97.

[23]  Charles D. Hansen,et al.  A data distributed, parallel algorithm for ray-traced volume rendering , 1993 .

[24]  Minesh B. Amin,et al.  Fast volume rendering using an efficient, scalable parallel formulation of the shear-warp algorithm , 1995, PRS.

[25]  Hanspeter Pfister,et al.  The VolumePro real-time ray-casting system , 1999, SIGGRAPH.