Recently, a new class of scalable, shared-address-space multiprocessors has emerged. Like message-passing machines, these multiprocessors have a distributed interconnection network and physically distributed main memory. However, they provide hardware support for efficient implicit communication through a shared address space, and they automatically exploit temporal locality by caching both local and remote data in a processor's hardware cache. In this article, we show that these architectural characteristics make it much easier to obtain very good speedups on the best known visualization algorithms. Simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management. We demonstrate our claims through parallel versions of three state-of-the-art algorithms: a recent hierarchical radiosity algorithm by Hanrahan et al. (1991), a parallelized ray-casting volume renderer by Levoy (1992), and an optimized ray-tracer by Spach and Pulleyblank (1992). We also discuss a new shear-warp volume rendering algorithm that provides the first demonstration of interactive frame rates for a 256/spl times/256/spl times/256 voxel data set on a general-purpose multiprocessor.<<ETX>>
[1]
M. Levoy,et al.
Fast volume rendering using a shear-warp factorization of the viewing transformation
,
1994,
SIGGRAPH.
[2]
Anoop Gupta,et al.
The directory-based cache coherence protocol for the DASH multiprocessor
,
1990,
ISCA '90.
[3]
Pat Hanrahan,et al.
A rapid hierarchical radiosity algorithm
,
1991,
SIGGRAPH.
[4]
Donald P. Greenberg,et al.
A progressive refinement approach to fast radiosity image generation
,
1988,
SIGGRAPH.
[5]
Jaswinder Pal Singh,et al.
Hierarchical n-body methods and their implications for multiprocessors
,
1993
.
[6]
John L. Hennessy,et al.
Multiprocessor Simulation and Tracing Using Tango
,
1991,
ICPP.
[7]
Anoop Gupta,et al.
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
,
1993,
ISCA '93.
[8]
Marc Levoy,et al.
Volume rendering on scalable shared-memory MIMD architectures
,
1992,
VVS.
[9]
Derek J. Paddon,et al.
Parallel Processing of Progressive Refinement Radiosity Methods
,
1994
.